WO2023143107A1 - Character recognition method and apparatus, device, and medium - Google Patents

Character recognition method and apparatus, device, and medium Download PDF

Info

Publication number
WO2023143107A1
WO2023143107A1 PCT/CN2023/072001 CN2023072001W WO2023143107A1 WO 2023143107 A1 WO2023143107 A1 WO 2023143107A1 CN 2023072001 W CN2023072001 W CN 2023072001W WO 2023143107 A1 WO2023143107 A1 WO 2023143107A1
Authority
WO
WIPO (PCT)
Prior art keywords
sample image
sample
network model
character recognition
image
Prior art date
Application number
PCT/CN2023/072001
Other languages
French (fr)
Chinese (zh)
Inventor
毛晓飞
黄灿
Original Assignee
北京有竹居网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京有竹居网络技术有限公司 filed Critical 北京有竹居网络技术有限公司
Publication of WO2023143107A1 publication Critical patent/WO2023143107A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting

Definitions

  • the present application relates to the field of computer technology, in particular to a character recognition method, device, equipment and medium.
  • OCR Optical Character Recognition
  • the OCR recognition model is generated through a supervised training method. During the training process, it is necessary to collect sample data that has been manually marked, and then use the sample data for training. In order to improve the recognition accuracy of the OCR recognition model, when collecting a large amount of sample data, it needs to consume a lot of manpower for manual labeling, which increases the training cost.
  • the embodiments of the present application provide a character recognition method, device, device, and medium, so as to implement model training using unlabeled sample data and reduce training costs.
  • a character recognition method comprising:
  • the text image to be processed includes text information to be recognized
  • the character recognition network model is generated by using training samples
  • the training samples include a first sample image and a plurality of sub-sample images corresponding to the first sample image, each of the plurality of sub-sample images
  • the height of each subsample image is the same as the height of the first sample image
  • the width of each subsample image in the plurality of subsample images is the same
  • the width of each subsample image is smaller than that of the first sample image Width
  • the first sample image includes text information.
  • a character recognition device in the second aspect of the embodiment of the present application, includes:
  • an acquisition unit configured to acquire a text image to be processed, the text image to be processed includes text information to be recognized;
  • a processing unit configured to input the text image to be processed into the character recognition network model to obtain an output result, the output result including the text information to be recognized; wherein, the character recognition network model is generated by training with training samples
  • the training sample includes a first sample image and a plurality of sub-sample images corresponding to the first sample image, and the height of each sub-sample image in the plurality of sub-sample images is the same as the height of the first sample image The heights are the same, the width of each sub-sample image in the plurality of sub-sample images is the same, and the width of each sub-sample image is smaller than the width of the first sample image, and the first sample image includes text information.
  • an electronic device in the third aspect of the embodiment of the present application, includes: a processor and a memory;
  • said memory for storing instructions or computer programs
  • the processor is configured to execute the instructions or computer programs in the memory, so that the electronic device executes the character recognition method described in the first aspect.
  • a computer-readable storage medium is provided, and instructions are stored in the computer-readable storage medium, and when the instructions are run on a device, the device executes the method described in the first aspect.
  • a computer program product which causes the computer to execute the character recognition method described in the first aspect when the computer program product is run on a computer.
  • character recognition is realized by using a pre-trained character recognition network model.
  • the character recognition network model is generated by using the first sample image and multiple sub-sample images corresponding to the first sample image. of.
  • the height of each sub-sample image in the plurality of sub-sample images is the same as the height of the first sample image
  • the width of each sub-sample image in the plurality of sub-sample images is the same and the width of each sub-sample image is smaller than the width of the first sample image .
  • Character recognition network model reduces labeling costs and improves training efficiency.
  • input the text image to be processed into the character recognition network model so that the character recognition network model can completely extract the features of the text information in the image, and perform recognition based on the extracted features to obtain the output result.
  • Fig. 1 is the flowchart of a kind of character recognition method that the embodiment of the present application provides;
  • Fig. 2 is the flowchart of a kind of training character recognition network model that the embodiment of the present application provides;
  • FIG. 3 is a schematic diagram of a division operation provided by an embodiment of the present application.
  • Fig. 4 is the flowchart of another kind of training character recognition network model that the embodiment of the present application provides;
  • FIG. 5 is a structural diagram of a character recognition device provided in an embodiment of the present application.
  • FIG. 6 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • OCR refers to the technology of analyzing and recognizing image files containing text data to obtain text.
  • the OCR recognition model is generated by a supervised training method. During the training process, it is necessary to collect sample data that has been manually marked. , and then use the sample data for training. In order to improve the recognition accuracy of the OCR recognition model, when a large amount of sample data is collected, it needs to consume a lot of manpower for manual labeling, which increases the training cost.
  • the embodiment of the present application provides a character recognition method, which is realized by using a pre-trained character recognition network model.
  • the character recognition network model is based on the first sample image and the multiple data corresponding to the first sample image. stature Sample images generated for training. Wherein, the height of each sub-sample image in the plurality of sub-sample images is the same as the height of the first sample image, the width of each sub-sample image in the plurality of sub-sample images is the same and the width of each sub-sample image is smaller than the width of the first sample image .
  • Character recognition network model reduces labeling costs and improves training efficiency.
  • input the text image to be processed into the character recognition network model so that the character recognition network model can completely extract the features of the text information in the image, and perform recognition based on the extracted features to obtain the output result.
  • FIG. 1 is a flowchart of a character recognition method provided by an embodiment of the present application.
  • the method specifically includes the following steps:
  • S101 Acquire a text image to be processed, where the text image to be processed includes text information to be recognized.
  • the text information to be recognized may be characters to be recognized, including Chinese characters, English words, English letters, numbers and symbols.
  • S102 Input the text image to be processed into the character recognition network model to obtain an output result, the output result including text information to be recognized.
  • the text image to be processed is obtained, in order to obtain the text information to be recognized included in the text image to be processed, the text image to be processed is input into the pre-trained character recognition network model to Through the processing of the character recognition network model, the text information to be recognized is output.
  • the character recognition network model is generated by using training samples.
  • the training samples include a first sample image and a plurality of sub-sample images corresponding to the first sample image.
  • the height of each sub-sample image in the plurality of sub-sample images is the same as The height of a sample image is the same, the width of each sub-sample image in the plurality of sub-sample images is the same, and the width of each sub-sample image is smaller than the width of the first sample image, and the first sample image includes text information.
  • the text image to be processed is input into the character recognition network model, and the character recognition network model can extract the features of the text image, and recognize them according to the extracted features, so as to output the text image. text information.
  • the trained character recognition network model can extract the complete features of the text information in the image, and perform text recognition based on the extracted complete features to obtain output results and improve the accuracy of recognition.
  • FIG. 2 is a flow chart of training a character recognition network model provided by an embodiment of the present application.
  • the method mainly includes the following steps:
  • S201 Acquire a first sample image and multiple sub-sample images corresponding to the first sample image.
  • the training samples include the first sample image and a plurality of sub-samples corresponding to the first sample image, wherein the first sample image and the plurality of sub-samples
  • the sample image includes text information.
  • the way to obtain the training samples can be to obtain the first A sample image, and then divide the first sample image into multiple sub-sample images, and use the first sample image and its corresponding multiple sub-sample images to form a training sample, that is, the sub-sample images are part of the first sample image.
  • the height of each sub-sample image is the same as that of the first sample image
  • the width of each sub-sample image is smaller than the width of the first sample image
  • the width of each sub-sample image is the same.
  • the first sample image may be divided into multiple sub-sample images in the following manner, specifically: firstly determine a division parameter, and then use the division parameter to divide the first sample image multiple times, Multiple subsample images are thereby acquired.
  • the ratio of the width of each sub-sample image to the width of the first sample image is equal to the division parameter, and the division parameter is greater than 0 and less than 1. That is, the first sample image is divided into multiple sub-sample images whose width is equal to the division parameter.
  • FIG. 3 is a schematic diagram of a division operation in an embodiment of the present application.
  • the width of the first sample image is width
  • the division parameter is K, that is, the width of each sub-sample image is K*width
  • the value range of K is 75%-95%.
  • the first sample image is divided into sub-sample image 1, sub-sample image 2, ..., sub-sample image n, wherein, sub-sample image 1, sub-sample image 2, ..., sub-sample image
  • the height of the sample image n is the same as that of the first sample image, and the width is K*width.
  • the starting position of each dividing operation may be a fixed interval, or may be determined randomly, which is not limited in this embodiment of the present application.
  • S202 Input the first sample image and multiple sub-sample images into the initial network model respectively, and obtain the first feature vector set and the second feature vector set, wherein the dimension of the first feature vector set is the same as the dimension of the second feature vector set
  • the numbers are the same, and the feature vectors in the first feature vector set and the feature vectors in the second feature vector set are feature vectors of text information.
  • the initial network model After acquiring the training samples including the first sample image and multiple sub-sample images obtained by dividing the first sample image, the initial network model can be trained.
  • the first sample image is input into the initial network model, and the first feature vector set corresponding to the first sample image is obtained.
  • the order of inputting the first sample image and multiple sub-sample images corresponding to the first sample image into the initial network model is not limited, that is, the first feature corresponding to the first sample image can be obtained first
  • the vector set may also first obtain the second feature vector set corresponding to the multiple sub-sample images.
  • S203 Determine a first loss amount according to the first feature vector set and the second feature vector set.
  • the first loss amount may be determined according to the first feature vector set and the second feature vector set.
  • a comparison loss function may be used to calculate the first loss amount between the first feature vector set and the second feature vector set.
  • Loss functions are usually used to represent the degree of matching between samples, and can also train models for extracting features. Usually, two originally similar training samples, after feature extraction by the model, the two feature vectors obtained are still similar in the feature space; two originally dissimilar samples, after feature extraction by the model, The resulting two eigenvectors are still dissimilar in the feature space. When two similar samples undergo feature extraction, the amount of loss between the two feature vectors calculated using the loss function should also be small.
  • the loss function can calculate the distance between two feature vectors, such as the Euclidean distance, to determine the amount of loss between the two feature vectors, so as to judge the quality of the initial network model model training.
  • S204 Adjust the parameters of the initial network model based on the first loss amount, re-input the first sample image and multiple sub-sample images into the initial network model and the subsequent training process until the first loss amount is less than the first threshold, and obtain Character recognition network model.
  • the sub-sample image is obtained by dividing the first sample image, so multiple sub-sample images have a high degree of similarity with the first sample image. Therefore, after feature extraction by the initial network model , the obtained first feature vector set and the second feature vector set should have a high similarity, that is, the first loss amount determined based on the first feature vector set and the second feature vector set should be small.
  • the first loss amount is greater than or equal to the first threshold, it indicates that the similarity between the first feature vector set and the second feature vector set obtained based on the initial network model does not meet the requirements, and the recognition after the initial network model training If the effect does not meet the requirements, it is necessary to adjust the parameters of the initial network model, and continue to use the first sample image and the corresponding sub-sample image to train the initial network model. That is, re-execute inputting the first sample image and multiple sub-sample images into the initial network model and the subsequent training process until the first loss determined based on the first feature vector set and the second feature vector set is less than the first threshold, Thus the final character recognition network model is obtained.
  • the value of the division parameter is fixed, that is, in the same training sample, the width of each sub-sample image is the same.
  • the value of the division parameter can be different, that is, the width of the sub-sample images can be different between different training samples .
  • set the value range of the division parameter to 75%-95%
  • set the value interval of the division parameter to 5%
  • the possible values of the division parameter are 75%, 80%, and 85%. , 90%, and 95%.
  • Using different training samples to train the initial network model can improve the accuracy of feature extraction by the character recognition network model, and facilitate subsequent text recognition using the trained character recognition network model.
  • the embodiment of the present application provides a possible implementation, by performing data enhancement processing on the first sample image to obtain the processed first sample image, input the first sample image after data enhancement processing into the initial network model, Get the first set of eigenvectors.
  • the data enhancement processing methods include: rotation, flip transformation, noise perturbation, etc., and this embodiment does not limit the specific manner of data enhancement.
  • the first sample image that has undergone data enhancement processing can also be divided into multiple sub-sample images, and the initial network model can be trained using the first sample image that has undergone data enhancement processing and the corresponding multiple sub-sample images, so as to improve character recognition. Accuracy of the network model.
  • the character recognition network model when training the character recognition network model, it is not necessary to manually label the first sample image, but to train characters by aligning local features (multiple sub-sample images) and overall features (first sample image) Recognize the network model, so that the character recognition network model can extract the complete features of the text information in the image, and then can use the complete text features for character recognition, which can not only reduce the cost of labeling, improve training efficiency, but also provide information for subsequent texts. Recognition lays the foundation and improves the accuracy of recognition.
  • the process of training the character recognition network mainly trains the encoder in the character recognition network model, that is, the text image to be processed is input into the character recognition network model, and the encoder first preprocesses the text image, Including digitization, geometric transformation, normalization, smoothing and other steps, then feature extraction is performed on the preprocessed text image, and the feature vector corresponding to the text image is obtained through the output of the fully connected layer.
  • the encoder in the character recognition network model the features in the text image extracted by the encoder can be made more accurate, and the feature vector corresponding to the obtained text image can be more accurate.
  • the embodiment of the present application also provides a preferred implementation method, that is, to train the decoder in the character recognition network model.
  • the main function of the decoder is to output the encoder
  • the eigenvectors are decoded to identify the text information corresponding to the eigenvectors.
  • FIG. 4 is a flow chart of another training character recognition network model provided by the embodiment of the present application.
  • the method mainly includes the following steps:
  • S401 Acquire a second sample image and an annotation corresponding to the second sample image, where the second sample image includes text information, and the annotation is used to reflect the text information.
  • the second sample image with annotations is obtained, and the annotation reflects the text information of the second sample image, which is used to identify with the character recognition network model.
  • the text information of the second sample image obtained is compared, and then the character recognition network model is trained according to the comparison result.
  • S402 Input the second sample image into the character recognition network model to obtain a recognition result, where the recognition result includes text information.
  • the character recognition network model acquires text information corresponding to the second sample image by performing feature extraction and feature recognition on the second sample image.
  • S403 Determine a second loss amount based on the recognition result and the annotation of the second sample image.
  • the second loss amount is determined based on the annotation in the second sample image and the recognition result, wherein the second loss amount represents the difference between the text information of the second sample image and the character recognition network model recognized The difference between the text information.
  • S404 Adjust the parameters of the character recognition network model based on the second loss amount, and re-execute inputting the second sample image into the character recognition network model and the subsequent training process until the second loss amount is smaller than the second threshold.
  • the second loss When the second loss is large, it indicates that the text information in the second text image recognized by the character recognition network model is quite different from the marked text information, and it is necessary to adjust the parameters of the character recognition network model and retrain the character recognition network model, that is, re-execute inputting the second sample image into the character recognition network model and the subsequent training process until the second loss amount is less than the second threshold.
  • the accuracy of the character recognition network model to recognize text information based on features can be improved.
  • the character recognition network model can be used for character recognition. Input the text image to be processed into the character recognition network model to obtain the output result, which includes the other text messages.
  • the character recognition network model since the character recognition network model has been preliminarily trained by the training method shown in Figure 2, the character recognition network model can realize feature extraction and basic recognition functions, when using the second sample image for further training , the training can be completed without obtaining a large number of labeled second sample images, which reduces the training cost and improves the recognition accuracy.
  • FIG. 5 is a structural diagram of a character recognition device provided by an embodiment of the present application.
  • the apparatus 500 may include: an acquiring unit 501 and a processing unit 502 .
  • An acquisition unit 501 configured to acquire a text image to be processed, the text image to be processed includes text information to be recognized;
  • the processing unit 502 is configured to input the text image to be processed into the character recognition network model to obtain an output result, the output result including the text information to be recognized; wherein, the character recognition network model is trained using training samples Generated, the training sample includes a first sample image and a plurality of sub-sample images corresponding to the first sample image, and the height of each sub-sample image in the plurality of sub-sample images is the same as that of the first sample image The heights of the sub-sample images are the same, the width of each sub-sample image in the plurality of sub-sample images is the same, and the width of each sub-sample image is smaller than the width of the first sample image, and the first sample image includes text information.
  • the processing unit 502 is specifically configured to input the first sample image and the plurality of sub-sample images into the initial network model respectively, and obtain the first feature vector set, the second feature vector set set, the dimensions of the first feature vector set are the same as the dimensions of the second feature vector set, and the feature vectors in the first feature vector set and the feature vectors in the second feature vector set are the text
  • the feature vector of information determine the first loss amount according to the first feature vector set and the second feature vector set, and adjust the parameters of the initial network model based on the first loss amount, and re-execute the
  • the first sample image and the plurality of sub-sample images are respectively input into the initial network model and the subsequent training process, until the first loss amount is less than a first threshold, and the character recognition network model is obtained.
  • the processing unit 502 is specifically configured to determine a division parameter, and use the division parameter to divide the first sample image multiple times to obtain the multiple sub-sample images, so The ratio of the width of each sub-sample image in the plurality of sub-sample images to the width of the sample image is equal to the division parameter, and the division parameter is greater than 0 and less than 1.
  • the processing unit 502 is specifically configured to, for each division operation, determine the starting position of the division in the first sample image; according to the starting position and the division The parameter divides the first sample image to obtain the plurality of sub-sample images.
  • the value range of the division parameter is 75%-95%.
  • the acquiring unit 501 is further configured to acquire a second sample image and annotations corresponding to the second sample image, the second sample image includes text information, and the annotations are used to reflect said text message;
  • the processing unit 502 is further configured to input the second sample image into the character recognition network model to obtain A recognition result, the recognition result including the text information; determining a second loss amount based on the recognition result and the annotation of the second sample image, and modifying the parameters of the character recognition network model based on the second loss amount Making adjustments, re-executing the input of the second sample image into the character recognition network model and the subsequent training process until the second loss amount is less than the second threshold.
  • the processing unit 502 is specifically configured to perform data enhancement processing on the first sample image to obtain the processed first sample image; This image is input into the initial network model to obtain the first feature vector set.
  • the processing unit 502 is specifically configured to use a comparison loss function to calculate a first loss amount between the first feature vector set and the second feature vector set.
  • FIG. 6 it shows a schematic structural diagram of an electronic device 600 suitable for implementing the embodiment of the present application.
  • the terminal equipment in the embodiment of the present application may include but not limited to mobile phones, notebook computers, digital broadcast receivers, PDA (Personal Digital Assistant, personal digital assistant), PAD (portable android device, tablet computer), PMP (Portable Media Player, portable multimedia player), mobile terminals such as vehicle-mounted terminals (such as vehicle-mounted navigation terminals), and fixed terminals such as digital TVs (television, television sets), desktop computers, and the like.
  • the electronic device shown in FIG. 6 is only an example, and should not limit the functions and scope of use of this embodiment of the present application.
  • an electronic device 600 may include a processing device (such as a central processing unit, a graphics processing unit, etc.) 601, which may be randomly accessed according to a program stored in a read-only memory (ROM) 602 or loaded from a storage device 608.
  • a processing device such as a central processing unit, a graphics processing unit, etc.
  • RAM memory
  • various programs and data necessary for the operation of the electronic device 600 are also stored.
  • the processing device 601, ROM 602, and RAM 603 are connected to each other through a bus 604.
  • An input/output (I/O) interface 605 is also connected to the bus 604 .
  • the following devices can be connected to the I/O interface 605: input devices 606 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a liquid crystal display (LCD), speaker, vibration an output device 607 such as a computer; a storage device 608 including, for example, a magnetic tape, a hard disk, etc.; and a communication device 609.
  • the communication means 609 may allow the electronic device 600 to communicate with other devices wirelessly or by wire to exchange data. While FIG. 6 shows electronic device 600 having various means, it should be understood that implementing or having all of the means shown is not a requirement. More or fewer means may alternatively be implemented or provided.
  • the processes described above with reference to the flowcharts can be implemented as computer software programs.
  • the embodiments of the present application include a computer program product, which includes a computer program carried on a non-transitory computer readable medium, where the computer program includes program code for executing the method shown in the flowchart.
  • the computer program may be downloaded and installed from a network via communication means 609 , or from storage means 608 , or from ROM 602 .
  • the processing device 601 the above-mentioned functions defined in the method of the embodiment of the present application are executed.
  • the electronic device provided by the embodiment of the present application and the method for adding group members provided by the above-mentioned embodiment belong to the same inventive concept.
  • the technical details not described in detail in this embodiment can be found in the above-mentioned embodiment, and this embodiment has the same features as the above-mentioned embodiment. beneficial effect.
  • An embodiment of the present application provides a computer-readable medium, on which a computer program is stored, wherein, when the program is executed by a processor, the method described in any of the foregoing embodiments is implemented.
  • the computer-readable medium mentioned above in this application may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the above two.
  • a computer readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or any combination thereof. More specific examples of computer-readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer diskettes, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, carrying computer-readable program code therein. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • a computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, which can send, propagate, or transmit a program for use by or in conjunction with an instruction execution system, apparatus, or device .
  • Program code embodied on a computer readable medium may be transmitted by any appropriate medium, including but not limited to: wires, optical cables, RF (radio frequency), etc., or any suitable combination of the above.
  • the client and the server can communicate using any currently known or future-developed network protocols such as HTTP (Hyper Text Transfer Protocol, Hypertext Transfer Protocol), and can communicate with any form or medium of digital Data communication (eg, communication network) interconnections.
  • HTTP Hyper Text Transfer Protocol
  • Examples of communication networks include local area networks ("LANs”), wide area networks ("WANs”), internetworks (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network of.
  • the above-mentioned computer-readable medium may be included in the above-mentioned electronic device, or may exist independently without being incorporated into the electronic device.
  • the above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the electronic device, the electronic device is made to execute the above-mentioned method.
  • Computer program code for carrying out the operations of this application may be written in one or more programming languages, or combinations thereof, including but not limited to object-oriented programming languages—such as Java, Smalltalk, C++, and Includes conventional procedural programming languages - such as the "C" language or similar programming languages.
  • the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
  • the remote computer can be connected to the user computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as through an Internet service provider). Internet connection).
  • LAN local area network
  • WAN wide area network
  • Internet service provider such as AT&T, MCI, Sprint, EarthLink, MSN, GTE, etc.
  • each block in a flowchart or block diagram may represent a module, program segment, or portion of code that includes one or more Executable instructions for implementing specified logical functions.
  • the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations can be implemented by a dedicated hardware-based system that performs the specified functions or operations , or may be implemented by a combination of dedicated hardware and computer instructions.
  • the units involved in the embodiments described in the present application may be implemented by means of software or by means of hardware.
  • the name of the unit/module does not constitute a limitation on the unit itself under certain circumstances, for example, the voice data collection module can also be described as a "data collection module”.
  • FPGAs Field Programmable Gate Arrays
  • ASICs Application Specific Integrated Circuits
  • ASSPs Application Specific Standard Products
  • SOCs System on Chips
  • CPLD Complex Programmable Logical device
  • a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device.
  • a machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
  • a machine-readable medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing.
  • machine-readable storage media would include one or more wire-based electrical connections, portable computer discs, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.
  • RAM random access memory
  • ROM read only memory
  • EPROM or flash memory erasable programmable read only memory
  • CD-ROM compact disk read only memory
  • magnetic storage or any suitable combination of the foregoing.
  • a character recognition method which may include:
  • the text image to be processed includes text information to be recognized
  • the character recognition network model is generated by using training samples
  • the training samples include a first sample image and a plurality of sub-sample images corresponding to the first sample image, each of the plurality of sub-sample images
  • the height of each subsample image is the same as the height of the first sample image
  • the width of each subsample image in the plurality of subsample images is the same
  • the width of each subsample image is smaller than that of the first sample image Width
  • the first sample image includes text information.
  • the training process of the character recognition network model includes:
  • the image is respectively input into the initial network model and the subsequent training process until the first loss amount is less than the first threshold, and the character recognition network model is obtained.
  • the acquisition process of the plurality of sub-sample images includes:
  • the width of each sub-sample image in the multiple sub-sample images is the same as the width of the sample image
  • the width ratio is equal to the division parameter, and the division parameter is greater than 0 and less than 1.
  • using the division parameters to divide the sample image multiple times to obtain the multiple sub-sample images includes:
  • the value range of the division parameter is 75%-95%.
  • the method further includes:
  • the second sample image includes text information, and the annotation is used to reflect the text information
  • the inputting the first sample image into the initial network model to obtain the first feature vector set includes:
  • the determining the first loss amount according to the first feature vector set and the second feature vector set includes:
  • a first loss amount between the first set of feature vectors and the second set of feature vectors is calculated using a contrastive loss function.
  • a character recognition device which may include:
  • an acquisition unit configured to acquire a text image to be processed, the text image to be processed includes text information to be recognized;
  • a processing unit configured to input the text image to be processed into the character recognition network model to obtain an output result, the output result including the text information to be recognized; wherein, the character recognition network model is generated by training with training samples
  • the training sample includes a first sample image and a plurality of sub-sample images corresponding to the first sample image, and the height of each sub-sample image in the plurality of sub-sample images is the same as the height of the first sample image The heights are the same, the width of each sub-sample image in the plurality of sub-sample images is the same, and the width of each sub-sample image is smaller than the width of the first sample image, and the first sample image includes text information.
  • the processing unit is specifically configured to convert the first sample image, the The plurality of sub-sample images are respectively input into the initial network model to obtain a first feature vector set and a second feature vector set, the dimension of the first feature vector set is the same as the dimension of the second feature vector set, and the second feature vector set
  • the feature vectors in a feature vector set and the feature vectors in the second feature vector set are feature vectors of the text information; determine a first loss amount according to the first feature vector set and the second feature vector set, and Adjust the parameters of the initial network model based on the first loss amount, and re-input the first sample image and the plurality of sub-sample images into the initial network model and the subsequent training process until the first loss amount If it is smaller than the first threshold, the character recognition network model is obtained.
  • the processing unit is specifically configured to determine a division parameter, and use the division parameter to perform multiple divisions on the first sample image to obtain the plurality of sub-sample images , the ratio of the width of each sub-sample image in the plurality of sub-sample images to the width of the sample image is equal to the division parameter, and the division parameter is greater than 0 and less than 1.
  • the processing unit is specifically configured to determine a starting position of division in the first sample image for each division operation; according to the starting position and the The first sample image is divided according to the division parameter to obtain the plurality of sub-sample images.
  • the value range of the division parameter is 75%-95%.
  • the acquiring unit is further configured to acquire a second sample image and an annotation corresponding to the second sample image, the second sample image includes text information, and the annotation uses to reflect the text information;
  • the processing unit is further configured to input the second sample image into the character recognition network model to obtain a recognition result, the recognition result including the text information; based on the recognition result and the second sample image
  • the second loss amount is determined based on the annotation of the second loss amount, and the parameters of the character recognition network model are adjusted based on the second loss amount, and the input of the second sample image into the character recognition network model and the subsequent training process are re-executed until the second The amount of loss is less than the second threshold.
  • the processing unit is specifically configured to perform data enhancement processing on the first sample image to obtain a processed first sample image;
  • a sample image is input into the initial network model to obtain the first set of feature vectors.
  • the processing unit is specifically configured to use a comparison loss function to calculate a first loss amount between the first feature vector set and the second feature vector set.
  • an electronic device includes: a processor and a memory;
  • said memory for storing instructions or computer programs
  • the processor is configured to execute the instructions or computer programs in the memory, so that the electronic device executes the character recognition method.
  • a computer-readable storage medium is provided. Instructions are stored in the computer-readable storage medium. When the instructions are run on a device, the device is made to execute the character recognition method.
  • each embodiment in this specification is described in a progressive manner, each embodiment focuses on the differences from other embodiments, and the same and similar parts of each embodiment can be referred to each other.
  • the description is relatively simple, and for the related part, please refer to the description of the method part.
  • At least one (item) means one or more, and “multiple” means two or more.
  • “And/or” is used to describe the association relationship of associated objects, indicating that there can be three types of relationships, for example, “A and/or B” can mean: only A exists, only B exists, and A and B exist at the same time , where A and B can be singular or plural.
  • the character “/” generally indicates that the contextual objects are an “or” relationship.
  • At least one of the following” or similar expressions refer to any combination of these items, including any combination of single or plural items.
  • At least one item (piece) of a, b or c can mean: a, b, c, "a and b", “a and c", “b and c", or "a and b and c ", where a, b, c can be single or multiple.
  • RAM random access memory
  • ROM read-only memory
  • EEPROM electrically programmable ROM
  • EEPROM electrically erasable programmable ROM
  • registers hard disk, removable disk, CD-ROM, or any other Any other known storage medium.

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Character Discrimination (AREA)
  • Image Analysis (AREA)

Abstract

The present application discloses a character recognition method, which is implemented using a pre-trained character recognition network model during character recognition. The character recognition network model is generated by means of training using a first sample image and multiple sub-sample images corresponding to the first sample image. The height of each sub-sample image in the multiple sub-sample images is the same as the height of the first sample image, the width of each sub-sample image is identical, and the width of each sub-sample image is smaller than the width of the first sample image. In the present application, manual annotation does not need to be carried out on the first sample image, and the character recognition network model is trained by means of aligning partial features (the multiple sub-sample images) and an overall feature (the first sample image), thereby reducing annotation costs and improving training efficiency. In practical use, by inputting a text image to the character recognition network model, features of text information in the image can be completely extracted, and recognition carried out according to the features to obtain an output result.

Description

一种字符识别方法、装置、设备及介质A character recognition method, device, equipment and medium
本申请要求于2022年1月30日提交的申请号为202210114334.6、申请名称为“一种字符识别方法、装置、设备及介质”的中国专利申请的优先权,其全部内容通过引用结合在本公开中。This application claims the priority of the Chinese patent application with the application number 202210114334.6 and the application title "A Character Recognition Method, Device, Equipment, and Medium" filed on January 30, 2022, the entire contents of which are incorporated in this disclosure by reference middle.
技术领域technical field
本申请涉及计算机技术领域,具体涉及一种字符识别方法、装置、设备及介质。The present application relates to the field of computer technology, in particular to a character recognition method, device, equipment and medium.
背景技术Background technique
光学字符识别(Optical Character Recognition,OCR)是指对包含文本资料的图像文件进行分析识别处理,获取文字的技术,是自动识别技术研究和应用领域中的一个重要方面。Optical Character Recognition (OCR) refers to the technology of analyzing and recognizing image files containing text data to obtain text, which is an important aspect in the field of automatic recognition technology research and application.
通常情况下,OCR识别模型通过有监督训练方法来生成,在该训练过程中,需要采集已进行人工标注的样本数据,进而利用该样本数据进行训练。为提高OCR识别模型的识别精度,则采集大量的样本数据时,需要耗费较大的人力进行人工标注,增加训练成本。Usually, the OCR recognition model is generated through a supervised training method. During the training process, it is necessary to collect sample data that has been manually marked, and then use the sample data for training. In order to improve the recognition accuracy of the OCR recognition model, when collecting a large amount of sample data, it needs to consume a lot of manpower for manual labeling, which increases the training cost.
发明内容Contents of the invention
有鉴于此,本申请实施例提供一种字符识别方法、装置、设备及介质,以实现利用无标注的样本数据进行模型进行训练,降低训练成本。In view of this, the embodiments of the present application provide a character recognition method, device, device, and medium, so as to implement model training using unlabeled sample data and reduce training costs.
为实现上述目的,本申请实施例提供的技术方案如下:In order to achieve the above purpose, the technical solutions provided by the embodiments of the present application are as follows:
在本申请实施例第一方面,提供了一种字符识别方法,所述方法包括:In the first aspect of the embodiment of the present application, a character recognition method is provided, the method comprising:
获取待处理的文本图像,所述待处理的文本图像包括待识别文本信息;Acquiring a text image to be processed, the text image to be processed includes text information to be recognized;
将所述待处理的文本图像输入字符识别网络模型中,获得输出结果,所述输出结果包括所述待识别文本信息;Inputting the text image to be processed into the character recognition network model to obtain an output result, the output result including the text information to be recognized;
其中,所述字符识别网络模型是利用训练样本训练生成的,所述训练样本包括第一样本图像以及所述第一样本图像所对应的多个子样本图像,所述多个子样本图像中每个子样本图像的高度与所述第一样本图像的高度相同,所述多个子样本图像中每个子样本图像的宽度相同,且所述每个子样本图像的宽度小于所述第一样本图像的宽度,所述第一样本图像包括文本信息。Wherein, the character recognition network model is generated by using training samples, the training samples include a first sample image and a plurality of sub-sample images corresponding to the first sample image, each of the plurality of sub-sample images The height of each subsample image is the same as the height of the first sample image, the width of each subsample image in the plurality of subsample images is the same, and the width of each subsample image is smaller than that of the first sample image Width, the first sample image includes text information.
在本申请实施例第二方面,提供了一种字符识别装置,所述装置包括:In the second aspect of the embodiment of the present application, a character recognition device is provided, and the device includes:
获取单元,用于获取待处理的文本图像,所述待处理的文本图像包括待识别文本信息;an acquisition unit, configured to acquire a text image to be processed, the text image to be processed includes text information to be recognized;
处理单元,用于将所述待处理的文本图像输入字符识别网络模型中,获得输出结果,所述输出结果包括所述待识别文本信息;其中,所述字符识别网络模型是利用训练样本训练生成的,所述训练样本包括第一样本图像以及所述第一样本图像所对应的多个子样本图像,所述多个子样本图像中每个子样本图像的高度与所述第一样本图像的高度相同,所述多个子样本图像中每个子样本图像的宽度相同,且所述每个子样本图像的宽度小于所述第一样本图像的宽度,所述第一样本图像包括文本信息。A processing unit, configured to input the text image to be processed into the character recognition network model to obtain an output result, the output result including the text information to be recognized; wherein, the character recognition network model is generated by training with training samples The training sample includes a first sample image and a plurality of sub-sample images corresponding to the first sample image, and the height of each sub-sample image in the plurality of sub-sample images is the same as the height of the first sample image The heights are the same, the width of each sub-sample image in the plurality of sub-sample images is the same, and the width of each sub-sample image is smaller than the width of the first sample image, and the first sample image includes text information.
在本申请实施例第三方面,提供了一种电子设备,所述设备包括:处理器和存储器;In the third aspect of the embodiment of the present application, an electronic device is provided, and the device includes: a processor and a memory;
所述存储器,用于存储指令或计算机程序;said memory for storing instructions or computer programs;
所述处理器,用于执行所述存储器中的所述指令或计算机程序,以使得所述电子设备执行第一方面所述的字符识别方法。 The processor is configured to execute the instructions or computer programs in the memory, so that the electronic device executes the character recognition method described in the first aspect.
在本申请实施例第四方面,提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,当所述指令在设备上运行时,使得所述设备执行第一方面所述的字符识别方法。In the fourth aspect of the embodiments of the present application, a computer-readable storage medium is provided, and instructions are stored in the computer-readable storage medium, and when the instructions are run on a device, the device executes the method described in the first aspect. The character recognition method described above.
在本申请实施例第五方面,提供了一种计算机程序产品,当所述计算机程序产品在计算机上运行时,使得计算机执行第一方面所述的字符识别方法。In the fifth aspect of the embodiment of the present application, a computer program product is provided, which causes the computer to execute the character recognition method described in the first aspect when the computer program product is run on a computer.
由此可见,本申请实施例具有如下有益效果:It can be seen that the embodiment of the present application has the following beneficial effects:
本申请实施例在进行字符识别时,利用的是预先训练的字符识别网络模型实现的,该字符识别网络模型是利用第一样本图像以及第一样本图像所对应的多个子样本图像训练生成的。其中,多个子样本图像中每个子样本图像的高度与第一样本图像的高度相同,多个子样本图像中每个子样本图像的宽度相同且每个子样本图像的宽度小于第一样本图像的宽度。也就是,本申请在训练字符识别网络模型时,无需对第一样本图像进行人工标注,而是通过对齐局部特征(多个子样本图像)和整体特征(第一样本图像)的方式来训练字符识别网络模型,降低标注成本,提高训练效率。在实际使用时,将待处理的文本图像输入字符识别网络模型,以使得字符识别网络模型能够完整的提取图像中文本信息的特征,并根据所提取的特征进行识别,获得输出结果。In the embodiment of the present application, character recognition is realized by using a pre-trained character recognition network model. The character recognition network model is generated by using the first sample image and multiple sub-sample images corresponding to the first sample image. of. Wherein, the height of each sub-sample image in the plurality of sub-sample images is the same as the height of the first sample image, the width of each sub-sample image in the plurality of sub-sample images is the same and the width of each sub-sample image is smaller than the width of the first sample image . That is, when the present application trains the character recognition network model, it is not necessary to manually label the first sample image, but to train by aligning local features (multiple sub-sample images) and overall features (first sample image) Character recognition network model reduces labeling costs and improves training efficiency. In actual use, input the text image to be processed into the character recognition network model, so that the character recognition network model can completely extract the features of the text information in the image, and perform recognition based on the extracted features to obtain the output result.
附图说明Description of drawings
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请中记载的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present application or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only These are some embodiments described in this application. Those skilled in the art can also obtain other drawings based on these drawings without creative work.
图1为本申请实施例提供的一种字符识别方法的流程图;Fig. 1 is the flowchart of a kind of character recognition method that the embodiment of the present application provides;
图2为本申请实施例提供的一种训练字符识别网络模型的流程图;Fig. 2 is the flowchart of a kind of training character recognition network model that the embodiment of the present application provides;
图3为本申请实施例提供的一种划分操作的示意图;FIG. 3 is a schematic diagram of a division operation provided by an embodiment of the present application;
图4为本申请实施例提供的另一种训练字符识别网络模型的流程图;Fig. 4 is the flowchart of another kind of training character recognition network model that the embodiment of the present application provides;
图5为本申请实施例提供的一种字符识别装置的结构图;FIG. 5 is a structural diagram of a character recognition device provided in an embodiment of the present application;
图6为本申请实施例提供的一种电子设备结构示意图。FIG. 6 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
具体实施方式Detailed ways
为了使本技术领域的人员更好地理解本申请方案,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。In order to enable those skilled in the art to better understand the solution of the application, the technical solution in the embodiment of the application will be clearly and completely described below in conjunction with the drawings in the embodiment of the application. Obviously, the described embodiment is only It is a part of the embodiments of this application, not all of them. Based on the embodiments in this application, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the scope of protection of this application.
OCR是指对包含文本资料的图像文件进行分析识别处理,获取文字的技术,通常情况下,OCR识别模型通过有监督训练方法来生成,在该训练过程中,需要采集已进行人工标注的样本数据,进而利用该样本数据进行训练。为提高OCR识别模型的识别精度,当采集大量的样本数据时,需要耗费较大的人力进行人工标注,增加训练成本。OCR refers to the technology of analyzing and recognizing image files containing text data to obtain text. Usually, the OCR recognition model is generated by a supervised training method. During the training process, it is necessary to collect sample data that has been manually marked. , and then use the sample data for training. In order to improve the recognition accuracy of the OCR recognition model, when a large amount of sample data is collected, it needs to consume a lot of manpower for manual labeling, which increases the training cost.
基于此,本申请实施例提供了一种字符识别方法,利用的是预先训练的字符识别网络模型实现的,该字符识别网络模型是基于第一样本图像以及第一样本图像所对应的多个子 样本图像训练生成的。其中,多个子样本图像中每个子样本图像的高度与第一样本图像的高度相同,多个子样本图像中每个子样本图像的宽度相同并且每个子样本图像的宽度小于第一样本图像的宽度。也就是,本申请在训练字符识别网络模型时,无需对第一样本图像进行人工标注,而是通过对齐局部特征(多个子样本图像)和整体特征(第一样本图像)的方式来训练字符识别网络模型,降低标注成本,提高训练效率。在实际使用时,将待处理的文本图像输入字符识别网络模型,以使得字符识别网络模型能够完整的提取图像中文本信息的特征,并根据所提取的特征进行识别,获得输出结果。Based on this, the embodiment of the present application provides a character recognition method, which is realized by using a pre-trained character recognition network model. The character recognition network model is based on the first sample image and the multiple data corresponding to the first sample image. stature Sample images generated for training. Wherein, the height of each sub-sample image in the plurality of sub-sample images is the same as the height of the first sample image, the width of each sub-sample image in the plurality of sub-sample images is the same and the width of each sub-sample image is smaller than the width of the first sample image . That is, when the present application trains the character recognition network model, it is not necessary to manually label the first sample image, but to train by aligning local features (multiple sub-sample images) and overall features (first sample image) Character recognition network model reduces labeling costs and improves training efficiency. In actual use, input the text image to be processed into the character recognition network model, so that the character recognition network model can completely extract the features of the text information in the image, and perform recognition based on the extracted features to obtain the output result.
下面将结合附图对本申请实施例所提供的字符识别方法进行说明。参见图1,图1为本申请实施例提供的一种字符识别方法的流程图。The character recognition method provided by the embodiment of the present application will be described below with reference to the accompanying drawings. Referring to FIG. 1 , FIG. 1 is a flowchart of a character recognition method provided by an embodiment of the present application.
该方法具体包括以下步骤:The method specifically includes the following steps:
S101:获取待处理的文本图像,该待处理的文本图像包括待识别文本信息。S101: Acquire a text image to be processed, where the text image to be processed includes text information to be recognized.
其中,待识别文本信息可以为待识别字符,包括汉字、英文单词、英文字母、数字以及符号等。Wherein, the text information to be recognized may be characters to be recognized, including Chinese characters, English words, English letters, numbers and symbols.
S102:将该待处理的文本图像输入字符识别网络模型中,获得输出结果,该输出结果包括待识别文本信息。S102: Input the text image to be processed into the character recognition network model to obtain an output result, the output result including text information to be recognized.
本实施例中,在获取到待处理的文本图像后,为获得该待处理的文本图像中所包括的待识别文本信息,将该待处理的文本图像输入预先训练的字符识别网络模型中,以通过该字符识别网络模型的处理,输出待识别文本信息。In this embodiment, after the text image to be processed is obtained, in order to obtain the text information to be recognized included in the text image to be processed, the text image to be processed is input into the pre-trained character recognition network model to Through the processing of the character recognition network model, the text information to be recognized is output.
其中,该字符识别网络模型是利用训练样本训练生成的,训练样本包括第一样本图像以及第一样本图像所对应的多个子样本图像,多个子样本图像中每个子样本图像的高度与第一样本图像的高度相同,多个子样本图像中每个子样本图像的宽度相同,且每个子样本图像的宽度小于第一样本图像的宽度,第一样本图像中包括文本信息。Wherein, the character recognition network model is generated by using training samples. The training samples include a first sample image and a plurality of sub-sample images corresponding to the first sample image. The height of each sub-sample image in the plurality of sub-sample images is the same as The height of a sample image is the same, the width of each sub-sample image in the plurality of sub-sample images is the same, and the width of each sub-sample image is smaller than the width of the first sample image, and the first sample image includes text information.
本申请实施例在训练字符识别网络模型时,无需对第一样本图像进行人工标注,而是通过对齐局部特征(多个子样本图像)和整体特征(第一样本图像)的方式来训练字符识别网络模型,可以降低标注成本,提高训练效率。In the embodiment of the present application, when training the character recognition network model, it is not necessary to manually label the first sample image, but to train characters by aligning local features (multiple sub-sample images) and overall features (first sample image) Identifying network models can reduce labeling costs and improve training efficiency.
在字符识别网络模型训练完成之后,将待处理的文本图像输入到字符识别网络模型中,即可由字符识别网络模型对文本图像进行特征提取,并根据所提取的特征进行识别,从而输出文本图像中的文本信息。After the character recognition network model training is completed, the text image to be processed is input into the character recognition network model, and the character recognition network model can extract the features of the text image, and recognize them according to the extracted features, so as to output the text image. text information.
由此可见,训练完成的字符识别网络模型可以提取图像中文本信息的完整特征,并根据所提取的完整特征进行文本识别,获得输出结果,提高识别的准确性。It can be seen that the trained character recognition network model can extract the complete features of the text information in the image, and perform text recognition based on the extracted complete features to obtain output results and improve the accuracy of recognition.
下面将结合附图对本申请训练字符识别网络模型的过程进行说明。参见图2,图2为本申请实施例提供的一种训练字符识别网络模型的流程图。The process of training the character recognition network model of the present application will be described below with reference to the accompanying drawings. Referring to FIG. 2 , FIG. 2 is a flow chart of training a character recognition network model provided by an embodiment of the present application.
该方法主要包括以下步骤:The method mainly includes the following steps:
S201:获取第一样本图像及第一样本图像所对应的多个子样本图像。S201: Acquire a first sample image and multiple sub-sample images corresponding to the first sample image.
在训练字符识别网络模型之前,首先需要获取训练初始网络模型的训练样本,该训练样本包括第一样本图像以及第一样本图像对应的多个子样本,其中,第一样本图像以及多个子样本图像中包括文本信息。其中,获取训练样本的方式可以为,获取无人工标注的第 一样本图像,然后将第一样本图像划分为多个子样本图像,利用第一样本图像及其所对应的多个子样本图像组成训练样本,即子样本图像为第一样本图像的一部分。其中,每个子样本图像的高度与第一样本图像的高度相同,每个子样本图像的宽度小于第一样本图像的宽度,并且每个子样本图像的宽度相同。Before training the character recognition network model, it is first necessary to obtain training samples for training the initial network model, the training samples include the first sample image and a plurality of sub-samples corresponding to the first sample image, wherein the first sample image and the plurality of sub-samples The sample image includes text information. Among them, the way to obtain the training samples can be to obtain the first A sample image, and then divide the first sample image into multiple sub-sample images, and use the first sample image and its corresponding multiple sub-sample images to form a training sample, that is, the sub-sample images are part of the first sample image. Wherein, the height of each sub-sample image is the same as that of the first sample image, the width of each sub-sample image is smaller than the width of the first sample image, and the width of each sub-sample image is the same.
在一种可能的实现方式中,可以通过以下方式将第一样本图像划分为多个子样本图像,具体为:首先确定划分参数,然后利用该划分参数对第一样本图像进行多次划分,从而获取多个子样本图像。每个子样本图像的宽度与第一样本图像的宽度的比值等于该划分参数,该划分参数大于0且小于1。即将第一样本图像划分为多个宽度等于划分参数的子样本图像。In a possible implementation manner, the first sample image may be divided into multiple sub-sample images in the following manner, specifically: firstly determine a division parameter, and then use the division parameter to divide the first sample image multiple times, Multiple subsample images are thereby acquired. The ratio of the width of each sub-sample image to the width of the first sample image is equal to the division parameter, and the division parameter is greater than 0 and less than 1. That is, the first sample image is divided into multiple sub-sample images whose width is equal to the division parameter.
在具体实现划分操作时,针对每次划分操作,可以先在第一样本图像中确定划分的起始位置,根据每次划分的起始位置以及划分参数对第一样本图像进行划分,从而获取多个子样本图像。参见图3,图3为本申请实施例中一种划分操作的示意图,在该场景下第一样本图像的宽度为width,划分参数为K,即每个子样本图像的宽度为K*width,其中,K的取值范围为75%-95%。针对每次划分操作,在第一样本图像中确定划分的起始位置,然后划分宽度为K*width的子样本图像。如图3所示,将第一样本图像划分为子样本图像1、子样本图像2,...,子样本图像n,其中,子样本图像1、子样本图像2,...,子样本图像n的高度与第一样本图像相同,宽度均为K*width。When implementing the division operation, for each division operation, the starting position of the division can be determined first in the first sample image, and the first sample image is divided according to the starting position of each division and the division parameters, so that Get multiple subsample images. Referring to FIG. 3, FIG. 3 is a schematic diagram of a division operation in an embodiment of the present application. In this scenario, the width of the first sample image is width, and the division parameter is K, that is, the width of each sub-sample image is K*width, Wherein, the value range of K is 75%-95%. For each division operation, determine the starting position of the division in the first sample image, and then divide the sub-sample image whose width is K*width. As shown in Figure 3, the first sample image is divided into sub-sample image 1, sub-sample image 2, ..., sub-sample image n, wherein, sub-sample image 1, sub-sample image 2, ..., sub-sample image The height of the sample image n is the same as that of the first sample image, and the width is K*width.
需要说明的是,每次划分操作的起始位置可以是间隔固定宽度,也可以是随机确定的,本申请实施例对此不做限定。It should be noted that, the starting position of each dividing operation may be a fixed interval, or may be determined randomly, which is not limited in this embodiment of the present application.
S202:将第一样本图像、多个子样本图像分别输入初始网络模型,获取第一特征向量集、第二特征向量集,其中,第一特征向量集的维数与第二特征向量集的维数相同,第一特征向量集中的特征向量和第二特征向量集中的特征向量为文本信息的特征向量。S202: Input the first sample image and multiple sub-sample images into the initial network model respectively, and obtain the first feature vector set and the second feature vector set, wherein the dimension of the first feature vector set is the same as the dimension of the second feature vector set The numbers are the same, and the feature vectors in the first feature vector set and the feature vectors in the second feature vector set are feature vectors of text information.
在获取包括第一样本图像以及将第一样本图像划分得到的多个子样本图像的训练样本之后,可以对初始网络模型进行训练。将第一样本图像输入初始网络模型,获取第一样本图像所对应的第一特征向量集。将上述划分得到的多个子样本图像输入初始网络模型中,获取第二特征向量集,其中,第一特征向量集的维数与第二特征向量集的维数相同,并且第一特征向量集和第二特征向量集均为上述样本图像中文本信息的特征向量。After acquiring the training samples including the first sample image and multiple sub-sample images obtained by dividing the first sample image, the initial network model can be trained. The first sample image is input into the initial network model, and the first feature vector set corresponding to the first sample image is obtained. Input the multiple sub-sample images obtained by the above division into the initial network model to obtain the second feature vector set, wherein the dimension of the first feature vector set is the same as the dimension of the second feature vector set, and the first feature vector set and The second set of feature vectors are all feature vectors of the text information in the sample image.
需要说明的是,将第一样本图像和第一样本图像所对应的多个子样本图像输入初始网络模型的顺序并不做限定,即可以先获取第一样本图像所对应的第一特征向量集,也可以先获取多个子样本图像所对应的第二特征向量集。It should be noted that the order of inputting the first sample image and multiple sub-sample images corresponding to the first sample image into the initial network model is not limited, that is, the first feature corresponding to the first sample image can be obtained first The vector set may also first obtain the second feature vector set corresponding to the multiple sub-sample images.
S203:根据第一特征向量集和第二特征向量集确定第一损失量。S203: Determine a first loss amount according to the first feature vector set and the second feature vector set.
基于初始网络模型获取第一特征向量集和第二特征向量集之后,可以根据第一特征向量集和第二特征向量集确定第一损失量。After the first feature vector set and the second feature vector set are acquired based on the initial network model, the first loss amount may be determined according to the first feature vector set and the second feature vector set.
在一种可能的实现方式中,可以利用对比损失函数计算第一特征向量集和第二特征向量集之间的第一损失量。损失函数通常用于表示样本之间的匹配程度,也可以训练提取特征的模型。通常情况下,两个原本相似的训练样本,经过模型进行特征提取后,所得到的两个特征向量在特征空间中依然相似;两个原本不相似的样本,经过模型进行特征提取后, 所得到的两个特征向量在特征空间依然不相似。当两个相似的样本经过特征提取后,利用损失函数计算的两个特征向量之间的损失量也应该较小。损失函数可以通过计算两个特征向量之间的距离,如欧氏距离,确定两个特征向量之间的损失量,从而判断初始网络模型模型训练的好坏。In a possible implementation manner, a comparison loss function may be used to calculate the first loss amount between the first feature vector set and the second feature vector set. Loss functions are usually used to represent the degree of matching between samples, and can also train models for extracting features. Usually, two originally similar training samples, after feature extraction by the model, the two feature vectors obtained are still similar in the feature space; two originally dissimilar samples, after feature extraction by the model, The resulting two eigenvectors are still dissimilar in the feature space. When two similar samples undergo feature extraction, the amount of loss between the two feature vectors calculated using the loss function should also be small. The loss function can calculate the distance between two feature vectors, such as the Euclidean distance, to determine the amount of loss between the two feature vectors, so as to judge the quality of the initial network model model training.
S204:基于第一损失量对初始网络模型的参数进行调整,重新执行将第一样本图像、多个子样本图像分别输入初始网络模型以及后续训练过程,直至第一损失量小于第一阈值,获得字符识别网络模型。S204: Adjust the parameters of the initial network model based on the first loss amount, re-input the first sample image and multiple sub-sample images into the initial network model and the subsequent training process until the first loss amount is less than the first threshold, and obtain Character recognition network model.
在本实施例中,子样本图像是通过第一样本图像划分得到的,所以多个子样本图像与第一样本图像之间具有较高的相似度,因此,经过初始网络模型进行特征提取后,所得到的第一特征向量集和第二特征向量集之间应该具有较高的相似度,即基于第一特征向量集和第二特征向量集所确定的第一损失量应该较小。因此当第一损失量大于或等于第一阈值时,表明基于初始网络模型所得到的第一特征向量集与第二特征向量集之间的相似度不满足要求,则初始网络模型训练后的识别效果不满足要求,需要对初始网络模型的参数进行调整,继续利用第一样本图像以及所对应的子样本图像训练该初始网络模型。即重新执行将第一样本图像以及多个子样本图像分别输入初始网络模型以及后续的训练过程,直至基于第一特征向量集和第二特征向量集所确定的第一损失量小于第一阈值,从而得到最终的字符识别网络模型。In this embodiment, the sub-sample image is obtained by dividing the first sample image, so multiple sub-sample images have a high degree of similarity with the first sample image. Therefore, after feature extraction by the initial network model , the obtained first feature vector set and the second feature vector set should have a high similarity, that is, the first loss amount determined based on the first feature vector set and the second feature vector set should be small. Therefore, when the first loss amount is greater than or equal to the first threshold, it indicates that the similarity between the first feature vector set and the second feature vector set obtained based on the initial network model does not meet the requirements, and the recognition after the initial network model training If the effect does not meet the requirements, it is necessary to adjust the parameters of the initial network model, and continue to use the first sample image and the corresponding sub-sample image to train the initial network model. That is, re-execute inputting the first sample image and multiple sub-sample images into the initial network model and the subsequent training process until the first loss determined based on the first feature vector set and the second feature vector set is less than the first threshold, Thus the final character recognition network model is obtained.
需要说明的是,在单次训练初始网络模型的过程中,划分参数的取值为固定的,即同一个训练样本中,每个子样本图像的宽度相同。当再次利用第一样本图像以及第一样本图像所对应的多个子样本图像训练初始网络模型时,划分参数的取值可以不同,即不同的训练样本之间,子样本图像的宽度可以不同。例如,设置划分参数的取值范围为75%-95%,设置划分参数的取值间隔为5%,即在不同的训练样本中,划分参数的可能取值为75%、80%、85%、90%以及95%。利用不同的训练样本训练初始网络模型,可以提高字符识别网络模型提取特征的准确性,方便后续利用训练完成的字符识别网络模型进行文字识别。It should be noted that during the single training process of the initial network model, the value of the division parameter is fixed, that is, in the same training sample, the width of each sub-sample image is the same. When using the first sample image and multiple sub-sample images corresponding to the first sample image to train the initial network model again, the value of the division parameter can be different, that is, the width of the sub-sample images can be different between different training samples . For example, set the value range of the division parameter to 75%-95%, and set the value interval of the division parameter to 5%, that is, in different training samples, the possible values of the division parameter are 75%, 80%, and 85%. , 90%, and 95%. Using different training samples to train the initial network model can improve the accuracy of feature extraction by the character recognition network model, and facilitate subsequent text recognition using the trained character recognition network model.
在实际应用过程中,可能由于训练初始网络模型的样本不够充足,导致训练得到的字符识别网络模型不够准确,表现为基于字符识别网络模型所确定的第一特征向量集和第二特征向量集之间的第一损失量较大。本申请实施例提供一种可能的实现方式,通过对第一样本图像进行数据增强处理,获得处理后的第一样本图像,将经过数据增强处理的第一样本图像输入初始网络模型,获取第一特征向量集。其中,数据增强处理方法包括:旋转、翻转变换、噪声扰动等,本实施例并不限定数据增强的具体方式。In the actual application process, it may be due to insufficient samples for training the initial network model that the character recognition network model obtained after training is not accurate enough. The first loss in the interval is relatively large. The embodiment of the present application provides a possible implementation, by performing data enhancement processing on the first sample image to obtain the processed first sample image, input the first sample image after data enhancement processing into the initial network model, Get the first set of eigenvectors. Wherein, the data enhancement processing methods include: rotation, flip transformation, noise perturbation, etc., and this embodiment does not limit the specific manner of data enhancement.
另外,还可以将经过数据增强处理的第一样本图像划分为多个子样本图像,利用经过数据增强处理的第一样本图像以及所对应的多个子样本图像训练初始网络模型,以便提高字符识别网络模型的准确性。In addition, the first sample image that has undergone data enhancement processing can also be divided into multiple sub-sample images, and the initial network model can be trained using the first sample image that has undergone data enhancement processing and the corresponding multiple sub-sample images, so as to improve character recognition. Accuracy of the network model.
本申请实施例在训练字符识别网络模型时,无需对第一样本图像进行人工标注,而是通过对齐局部特征(多个子样本图像)和整体特征(第一样本图像)的方式来训练字符识别网络模型,使得字符识别网络模型能够提取图像中文本信息的完整特征,进而可以利用完整的文本特征进行字符识别,既可以降低标注成本,提高训练效率,又可以为后续文本 识别奠定基础,提高识别的准确性。In the embodiment of the present application, when training the character recognition network model, it is not necessary to manually label the first sample image, but to train characters by aligning local features (multiple sub-sample images) and overall features (first sample image) Recognize the network model, so that the character recognition network model can extract the complete features of the text information in the image, and then can use the complete text features for character recognition, which can not only reduce the cost of labeling, improve training efficiency, but also provide information for subsequent texts. Recognition lays the foundation and improves the accuracy of recognition.
上述实施例所提供的训练字符识别网络的过程,主要针对字符识别网络模型中的编码器进行训练,即将待处理的文本图像输入到字符识别网络模型中,编码器首先对文本图像进行预处理,包括数字化、几何变换、归一化、平滑等步骤,然后对经过预处理的文字图像进行特征提取,通过全连接层输出得到文本图像所对应的特征向量。通过训练字符识别网络模型中的编码器,可以使编码器所提取的文本图像中的特征更加准确,得到的文本图像所对应的特征向量更加准确。The process of training the character recognition network provided by the above-mentioned embodiments mainly trains the encoder in the character recognition network model, that is, the text image to be processed is input into the character recognition network model, and the encoder first preprocesses the text image, Including digitization, geometric transformation, normalization, smoothing and other steps, then feature extraction is performed on the preprocessed text image, and the feature vector corresponding to the text image is obtained through the output of the fully connected layer. By training the encoder in the character recognition network model, the features in the text image extracted by the encoder can be made more accurate, and the feature vector corresponding to the obtained text image can be more accurate.
为了进一步提高字符识别网络模型识别文本信息的准确性,本申请实施例还提供一种优选的实现方式,即针对字符识别网络模型中的解码器进行训练,解码器的主要功能为将编码器输出的特征向量进行解码,识别特征向量所对应的文本信息。通过训练字符识别网络模型中解码器的功能,可以提高识别文本信息的准确性。下面将结合附图对训练字符识别网络模型识别文本信息的过程进行说明。In order to further improve the accuracy of text information recognized by the character recognition network model, the embodiment of the present application also provides a preferred implementation method, that is, to train the decoder in the character recognition network model. The main function of the decoder is to output the encoder The eigenvectors are decoded to identify the text information corresponding to the eigenvectors. By training the function of the decoder in the character recognition network model, the accuracy of recognizing text information can be improved. The process of training a character recognition network model to recognize text information will be described below with reference to the accompanying drawings.
参见图4,图4为本申请实施例提供的另一种训练字符识别网络模型的流程图。Referring to FIG. 4 , FIG. 4 is a flow chart of another training character recognition network model provided by the embodiment of the present application.
该方法主要包括以下步骤:The method mainly includes the following steps:
S401:获取第二样本图像以及第二样本图像对应的标注,其中,第二样本图像包括文本信息,标注用于反映该文本信息。S401: Acquire a second sample image and an annotation corresponding to the second sample image, where the second sample image includes text information, and the annotation is used to reflect the text information.
本实施例中,为实现对字符识别网络模型的解码器进行进一步地训练,获取具有标注的第二样本图像,该标注反映了第二样本图像的文本信息,用于和字符识别网络模型所识别到的第二样本图像的文本信息做比较,进而根据比较结果训练字符识别网络模型。In this embodiment, in order to further train the decoder of the character recognition network model, the second sample image with annotations is obtained, and the annotation reflects the text information of the second sample image, which is used to identify with the character recognition network model. The text information of the second sample image obtained is compared, and then the character recognition network model is trained according to the comparison result.
S402:将第二样本图像输入字符识别网络模型中,获得识别结果,该识别结果包括文本信息。S402: Input the second sample image into the character recognition network model to obtain a recognition result, where the recognition result includes text information.
字符识别网络模型通过对第二样本图像进行特征提取以及特征识别,获取第二样本图像所对应的文本信息。The character recognition network model acquires text information corresponding to the second sample image by performing feature extraction and feature recognition on the second sample image.
S403:基于识别结果以及第二样本图像的标注确定第二损失量。S403: Determine a second loss amount based on the recognition result and the annotation of the second sample image.
在获得字符识别网络模型输出的识别结果后,基于第二样本图像中的标注以及识别结果确定第二损失量,其中第二损失量表示第二样本图像的文本信息与字符识别网络模型所识别到的文本信息之间的差异性。After obtaining the recognition result output by the character recognition network model, the second loss amount is determined based on the annotation in the second sample image and the recognition result, wherein the second loss amount represents the difference between the text information of the second sample image and the character recognition network model recognized The difference between the text information.
S404:基于第二损失量对字符识别网络模型的参数进行调整,重新执行将第二样本图像输入字符识别网络模型以及后续训练过程,直至第二损失量小于第二阈值。S404: Adjust the parameters of the character recognition network model based on the second loss amount, and re-execute inputting the second sample image into the character recognition network model and the subsequent training process until the second loss amount is smaller than the second threshold.
当第二损失量较大时,表明字符识别网络模型所识别的第二文本图像中的文本信息与标注文本信息有较大的差异,需要调整字符识别网络模型的参数,并重新训练字符识别网络模型,即重新执行将第二样本图像输入字符字符识别网络模型以及后续的训练过程,直至第二损失量小于第二阈值。When the second loss is large, it indicates that the text information in the second text image recognized by the character recognition network model is quite different from the marked text information, and it is necessary to adjust the parameters of the character recognition network model and retrain the character recognition network model, that is, re-execute inputting the second sample image into the character recognition network model and the subsequent training process until the second loss amount is less than the second threshold.
通过训练字符识别网络模型中的解码器,可以提高字符识别网络模型基于特征识别文本信息的准确性。By training the decoder in the character recognition network model, the accuracy of the character recognition network model to recognize text information based on features can be improved.
当训练的字符识别网络模型满足要求时,即可利用字符识别网络模型进行字符识别。即将待处理的文本图像输入到字符识别网络模型中,获得输出结果,该输出结果包括待识 别的文本信息。When the trained character recognition network model meets the requirements, the character recognition network model can be used for character recognition. Input the text image to be processed into the character recognition network model to obtain the output result, which includes the other text messages.
本实施例中,由于字符识别网络模型已经通过图2所示的训练方法进行了初步的训练,使得字符识别网络模型能够实现特征提取和基本的识别功能,在利用第二样本图像进行进一步训练时,无需获取大量的带有标注的第二样本图像即可完成训练,降低训练成本,提高识别准确度。In this embodiment, since the character recognition network model has been preliminarily trained by the training method shown in Figure 2, the character recognition network model can realize feature extraction and basic recognition functions, when using the second sample image for further training , the training can be completed without obtaining a large number of labeled second sample images, which reduces the training cost and improves the recognition accuracy.
基于上述方法实施例,本申请实施例提供了一种实现上述方法的装置和设备,下面将结合附图进行说明。Based on the foregoing method embodiments, embodiments of the present application provide an apparatus and equipment for implementing the foregoing method, which will be described below with reference to the accompanying drawings.
参见图5,图5为本申请实施例提供的一种字符识别装置的结构图。如图5所示,该装置500可以包括:获取单元501以及处理单元502。Referring to FIG. 5, FIG. 5 is a structural diagram of a character recognition device provided by an embodiment of the present application. As shown in FIG. 5 , the apparatus 500 may include: an acquiring unit 501 and a processing unit 502 .
获取单元501,用于获取待处理的文本图像,所述待处理的文本图像包括待识别文本信息;An acquisition unit 501, configured to acquire a text image to be processed, the text image to be processed includes text information to be recognized;
处理单元502,用于将所述待处理的文本图像输入字符识别网络模型中,获得输出结果,所述输出结果包括所述待识别文本信息;其中,所述字符识别网络模型是利用训练样本训练生成的,所述训练样本包括第一样本图像以及所述第一样本图像所对应的多个子样本图像,所述多个子样本图像中每个子样本图像的高度与所述第一样本图像的高度相同,所述多个子样本图像中每个子样本图像的宽度相同,且所述每个子样本图像的宽度小于所述第一样本图像的宽度,所述第一样本图像包括文本信息。The processing unit 502 is configured to input the text image to be processed into the character recognition network model to obtain an output result, the output result including the text information to be recognized; wherein, the character recognition network model is trained using training samples Generated, the training sample includes a first sample image and a plurality of sub-sample images corresponding to the first sample image, and the height of each sub-sample image in the plurality of sub-sample images is the same as that of the first sample image The heights of the sub-sample images are the same, the width of each sub-sample image in the plurality of sub-sample images is the same, and the width of each sub-sample image is smaller than the width of the first sample image, and the first sample image includes text information.
在一种具体的实现方式中,所述处理单元502,具体用于将所述第一样本图像、所述多个子样本图像分别输入初始网络模型,获取第一特征向量集、第二特征向量集,所述第一特征向量集的维数与所述第二特征向量集的维数相同,所述第一特征向量集中的特征向量和所述第二特征向量集中的特征向量为所述文本信息的特征向量;根据所述第一特征向量集和所述第二特征向量集确定第一损失量,并基于所述第一损失量对所述初始网络模型的参数进行调整,重新执行将所述第一样本图像、所述多个子样本图像分别输入初始网络模型以及后续训练过程,直至第一损失量小于第一阈值,获得所述字符识别网络模型。In a specific implementation manner, the processing unit 502 is specifically configured to input the first sample image and the plurality of sub-sample images into the initial network model respectively, and obtain the first feature vector set, the second feature vector set set, the dimensions of the first feature vector set are the same as the dimensions of the second feature vector set, and the feature vectors in the first feature vector set and the feature vectors in the second feature vector set are the text The feature vector of information; determine the first loss amount according to the first feature vector set and the second feature vector set, and adjust the parameters of the initial network model based on the first loss amount, and re-execute the The first sample image and the plurality of sub-sample images are respectively input into the initial network model and the subsequent training process, until the first loss amount is less than a first threshold, and the character recognition network model is obtained.
在一种具体的实现方式中,所述处理单元502,具体用于确定划分参数,并利用所述划分参数对所述第一样本图像进行多次划分,获得所述多个子样本图像,所述多个子样本图像中每个子样本图像的宽度与所述样本图像的宽度的比值等于所述划分参数,所述划分参数大于0且小于1。In a specific implementation manner, the processing unit 502 is specifically configured to determine a division parameter, and use the division parameter to divide the first sample image multiple times to obtain the multiple sub-sample images, so The ratio of the width of each sub-sample image in the plurality of sub-sample images to the width of the sample image is equal to the division parameter, and the division parameter is greater than 0 and less than 1.
在一种具体的实现方式中,所述处理单元502,具体用于针对每次划分操作,在所述第一样本图像中确定划分的起始位置;根据所述起始位置以及所述划分参数对所述第一样本图像进行划分,获得所述多个子样本图像。In a specific implementation manner, the processing unit 502 is specifically configured to, for each division operation, determine the starting position of the division in the first sample image; according to the starting position and the division The parameter divides the first sample image to obtain the plurality of sub-sample images.
在一种具体的实现方式中,所述划分参数的取值范围位于75%-95%。In a specific implementation manner, the value range of the division parameter is 75%-95%.
在一种具体的实现方式中,所述获取单元501,还用于获取第二样本图像以及所述第二样本图像对应的标注,所述第二样本图像包括文本信息,所述标注用于反映所述文本信息;In a specific implementation manner, the acquiring unit 501 is further configured to acquire a second sample image and annotations corresponding to the second sample image, the second sample image includes text information, and the annotations are used to reflect said text message;
所述处理单元502,还用于将所述第二样本图像输入所述字符识别网络模型中,获得 识别结果,所述识别结果包括所述文本信息;基于所述识别结果以及所述第二样本图像的标注确定第二损失量,并基于所述第二损失量对所述字符识别网络模型的参数进行调整,重新执行将所述第二样本图像输入字符识别网络模型以及后续训练过程,直至第二损失量小于第二阈值。The processing unit 502 is further configured to input the second sample image into the character recognition network model to obtain A recognition result, the recognition result including the text information; determining a second loss amount based on the recognition result and the annotation of the second sample image, and modifying the parameters of the character recognition network model based on the second loss amount Making adjustments, re-executing the input of the second sample image into the character recognition network model and the subsequent training process until the second loss amount is less than the second threshold.
在一种具体的实现方式中,所述处理单元502,具体用于对所述第一样本图像进行数据增强处理,获得处理后的第一样本图像;将所述处理后的第一样本图像输入所述初始网络模型,获取所述第一特征向量集。In a specific implementation manner, the processing unit 502 is specifically configured to perform data enhancement processing on the first sample image to obtain the processed first sample image; This image is input into the initial network model to obtain the first feature vector set.
在一种具体的实现方式中,所述处理单元502,具体用于利用对比损失函数计算所述第一特征向量集与所述第二特征向量集之间的第一损失量。In a specific implementation manner, the processing unit 502 is specifically configured to use a comparison loss function to calculate a first loss amount between the first feature vector set and the second feature vector set.
需要说明的是,本实施例中各个单元的实现可以参见上述方法实施例中的相关描述,本实施例在此不再赘述。It should be noted that, for implementation of each unit in this embodiment, reference may be made to relevant descriptions in the foregoing method embodiments, and details are not repeated in this embodiment.
参见图6,其示出了适于用来实现本申请实施例的电子设备600的结构示意图。本申请实施例中的终端设备可以包括但不限于诸如移动电话、笔记本电脑、数字广播接收器、PDA(Personal Digital Assistant,个人数字助理)、PAD(portable android device,平板电脑)、PMP(Portable Media Player,便携式多媒体播放器)、车载终端(例如车载导航终端)等等的移动终端以及诸如数字TV(television,电视机)、台式计算机等等的固定终端。图6示出的电子设备仅仅是一个示例,不应对本申请实施例的功能和使用范围带来任何限制。Referring to FIG. 6 , it shows a schematic structural diagram of an electronic device 600 suitable for implementing the embodiment of the present application. The terminal equipment in the embodiment of the present application may include but not limited to mobile phones, notebook computers, digital broadcast receivers, PDA (Personal Digital Assistant, personal digital assistant), PAD (portable android device, tablet computer), PMP (Portable Media Player, portable multimedia player), mobile terminals such as vehicle-mounted terminals (such as vehicle-mounted navigation terminals), and fixed terminals such as digital TVs (television, television sets), desktop computers, and the like. The electronic device shown in FIG. 6 is only an example, and should not limit the functions and scope of use of this embodiment of the present application.
如图6所示,电子设备600可以包括处理装置(例如中央处理器、图形处理器等)601,其可以根据存储在只读存储器(ROM)602中的程序或者从存储装置608加载到随机访问存储器(RAM)603中的程序而执行各种适当的动作和处理。在RAM603中,还存储有电子设备600操作所需的各种程序和数据。处理装置601、ROM 602以及RAM 603通过总线604彼此相连。输入/输出(I/O)接口605也连接至总线604。As shown in FIG. 6, an electronic device 600 may include a processing device (such as a central processing unit, a graphics processing unit, etc.) 601, which may be randomly accessed according to a program stored in a read-only memory (ROM) 602 or loaded from a storage device 608. Various appropriate actions and processes are executed by programs in the memory (RAM) 603 . In the RAM 603, various programs and data necessary for the operation of the electronic device 600 are also stored. The processing device 601, ROM 602, and RAM 603 are connected to each other through a bus 604. An input/output (I/O) interface 605 is also connected to the bus 604 .
通常,以下装置可以连接至I/O接口605:包括例如触摸屏、触摸板、键盘、鼠标、摄像头、麦克风、加速度计、陀螺仪等的输入装置606;包括例如液晶显示器(LCD)、扬声器、振动器等的输出装置607;包括例如磁带、硬盘等的存储装置608;以及通信装置609。通信装置609可以允许电子设备600与其他设备进行无线或有线通信以交换数据。虽然图6示出了具有各种装置的电子设备600,但是应理解的是,并不要求实施或具备所有示出的装置。可以替代地实施或具备更多或更少的装置。Typically, the following devices can be connected to the I/O interface 605: input devices 606 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a liquid crystal display (LCD), speaker, vibration an output device 607 such as a computer; a storage device 608 including, for example, a magnetic tape, a hard disk, etc.; and a communication device 609. The communication means 609 may allow the electronic device 600 to communicate with other devices wirelessly or by wire to exchange data. While FIG. 6 shows electronic device 600 having various means, it should be understood that implementing or having all of the means shown is not a requirement. More or fewer means may alternatively be implemented or provided.
特别地,根据本申请的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本申请的实施例包括一种计算机程序产品,其包括承载在非暂态计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信装置609从网络上被下载和安装,或者从存储装置608被安装,或者从ROM602被安装。在该计算机程序被处理装置601执行时,执行本申请实施例的方法中限定的上述功能。In particular, according to the embodiments of the present application, the processes described above with reference to the flowcharts can be implemented as computer software programs. For example, the embodiments of the present application include a computer program product, which includes a computer program carried on a non-transitory computer readable medium, where the computer program includes program code for executing the method shown in the flowchart. In such an embodiment, the computer program may be downloaded and installed from a network via communication means 609 , or from storage means 608 , or from ROM 602 . When the computer program is executed by the processing device 601, the above-mentioned functions defined in the method of the embodiment of the present application are executed.
本申请实施例提供的电子设备与上述实施例提供的群成员添加方法属于同一发明构思,未在本实施例中详尽描述的技术细节可参见上述实施例,并且本实施例与上述实施例具有相同的有益效果。 The electronic device provided by the embodiment of the present application and the method for adding group members provided by the above-mentioned embodiment belong to the same inventive concept. The technical details not described in detail in this embodiment can be found in the above-mentioned embodiment, and this embodiment has the same features as the above-mentioned embodiment. beneficial effect.
本申请实施例提供一种计算机可读介质,其上存储有计算机程序,其中,所述程序被处理器执行时实现如上述任一实施例所述的方法。An embodiment of the present application provides a computer-readable medium, on which a computer program is stored, wherein, when the program is executed by a processor, the method described in any of the foregoing embodiments is implemented.
需要说明的是,本申请上述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本申请中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本申请中,计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读信号介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:电线、光缆、RF(射频)等等,或者上述的任意合适的组合。It should be noted that the computer-readable medium mentioned above in this application may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the above two. A computer readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or any combination thereof. More specific examples of computer-readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer diskettes, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above. In the present application, a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device. In this application, however, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, carrying computer-readable program code therein. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. A computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, which can send, propagate, or transmit a program for use by or in conjunction with an instruction execution system, apparatus, or device . Program code embodied on a computer readable medium may be transmitted by any appropriate medium, including but not limited to: wires, optical cables, RF (radio frequency), etc., or any suitable combination of the above.
在一些实施方式中,客户端、服务器可以利用诸如HTTP(Hyper Text Transfer Protocol,超文本传输协议)之类的任何当前已知或未来研发的网络协议进行通信,并且可以与任意形式或介质的数字数据通信(例如,通信网络)互连。通信网络的示例包括局域网(“LAN”),广域网(“WAN”),网际网(例如,互联网)以及端对端网络(例如,ad hoc端对端网络),以及任何当前已知或未来研发的网络。In some implementations, the client and the server can communicate using any currently known or future-developed network protocols such as HTTP (Hyper Text Transfer Protocol, Hypertext Transfer Protocol), and can communicate with any form or medium of digital Data communication (eg, communication network) interconnections. Examples of communication networks include local area networks ("LANs"), wide area networks ("WANs"), internetworks (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network of.
上述计算机可读介质可以是上述电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。The above-mentioned computer-readable medium may be included in the above-mentioned electronic device, or may exist independently without being incorporated into the electronic device.
上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该电子设备执行时,使得该电子设备执行上述方法。The above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the electronic device, the electronic device is made to execute the above-mentioned method.
可以以一种或多种程序设计语言或其组合来编写用于执行本申请的操作的计算机程序代码,上述程序设计语言包括但不限于面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。Computer program code for carrying out the operations of this application may be written in one or more programming languages, or combinations thereof, including but not limited to object-oriented programming languages—such as Java, Smalltalk, C++, and Includes conventional procedural programming languages - such as the "C" language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In cases involving a remote computer, the remote computer can be connected to the user computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as through an Internet service provider). Internet connection).
附图中的流程图和框图,图示了按照本申请各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个 用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in a flowchart or block diagram may represent a module, program segment, or portion of code that includes one or more Executable instructions for implementing specified logical functions. It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved. It should also be noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by a dedicated hardware-based system that performs the specified functions or operations , or may be implemented by a combination of dedicated hardware and computer instructions.
描述于本申请实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现。其中,单元/模块的名称在某种情况下并不构成对该单元本身的限定,例如,语音数据采集模块还可以被描述为“数据采集模块”。The units involved in the embodiments described in the present application may be implemented by means of software or by means of hardware. Wherein, the name of the unit/module does not constitute a limitation on the unit itself under certain circumstances, for example, the voice data collection module can also be described as a "data collection module".
本文中以上描述的功能可以至少部分地由一个或多个硬件逻辑部件来执行。例如,非限制性地,可以使用的示范类型的硬件逻辑部件包括:现场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、片上系统(SOC)、复杂可编程逻辑设备(CPLD)等等。The functions described above herein may be performed at least in part by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), System on Chips (SOCs), Complex Programmable Logical device (CPLD) and so on.
在本申请的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。In the context of the present application, a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device. A machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing. More specific examples of machine-readable storage media would include one or more wire-based electrical connections, portable computer discs, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.
根据本申请的一个或多个实施例,提供了一种字符识别方法,该方法可以包括:According to one or more embodiments of the present application, a character recognition method is provided, which may include:
获取待处理的文本图像,所述待处理的文本图像包括待识别文本信息;Acquiring a text image to be processed, the text image to be processed includes text information to be recognized;
将所述待处理的文本图像输入字符识别网络模型中,获得输出结果,所述输出结果包括所述待识别文本信息;Inputting the text image to be processed into the character recognition network model to obtain an output result, the output result including the text information to be recognized;
其中,所述字符识别网络模型是利用训练样本训练生成的,所述训练样本包括第一样本图像以及所述第一样本图像所对应的多个子样本图像,所述多个子样本图像中每个子样本图像的高度与所述第一样本图像的高度相同,所述多个子样本图像中每个子样本图像的宽度相同,且所述每个子样本图像的宽度小于所述第一样本图像的宽度,所述第一样本图像包括文本信息。Wherein, the character recognition network model is generated by using training samples, the training samples include a first sample image and a plurality of sub-sample images corresponding to the first sample image, each of the plurality of sub-sample images The height of each subsample image is the same as the height of the first sample image, the width of each subsample image in the plurality of subsample images is the same, and the width of each subsample image is smaller than that of the first sample image Width, the first sample image includes text information.
根据本申请的一个或多个实施例,所述字符识别网络模型的训练过程包括:According to one or more embodiments of the present application, the training process of the character recognition network model includes:
将所述第一样本图像、所述多个子样本图像分别输入初始网络模型,获取第一特征向量集、第二特征向量集,所述第一特征向量集的维数与所述第二特征向量集的维数相同,所述第一特征向量集中的特征向量和所述第二特征向量集中的特征向量为所述文本信息的特征向量;Input the first sample image and the plurality of sub-sample images into the initial network model respectively, and obtain the first feature vector set and the second feature vector set, the dimension of the first feature vector set and the second feature The dimensions of the vector sets are the same, and the feature vectors in the first feature vector set and the feature vectors in the second feature vector set are feature vectors of the text information;
根据所述第一特征向量集和所述第二特征向量集确定第一损失量,并基于所述第一损失量对所述初始网络模型的参数进行调整,重新执行将所述第一样本图像、所述多个子样 本图像分别输入初始网络模型以及后续训练过程,直至第一损失量小于第一阈值,获得所述字符识别网络模型。Determine the first loss amount according to the first feature vector set and the second feature vector set, and adjust the parameters of the initial network model based on the first loss amount, and re-execute the first sample image, the plurality of instances The image is respectively input into the initial network model and the subsequent training process until the first loss amount is less than the first threshold, and the character recognition network model is obtained.
根据本申请的一个或多个实施例,所述多个子样本图像的获取过程包括:According to one or more embodiments of the present application, the acquisition process of the plurality of sub-sample images includes:
确定划分参数,并利用所述划分参数对所述第一样本图像进行多次划分,获得所述多个子样本图像,所述多个子样本图像中每个子样本图像的宽度与所述样本图像的宽度的比值等于所述划分参数,所述划分参数大于0且小于1。determining the division parameters, and using the division parameters to divide the first sample image multiple times to obtain the multiple sub-sample images, the width of each sub-sample image in the multiple sub-sample images is the same as the width of the sample image The width ratio is equal to the division parameter, and the division parameter is greater than 0 and less than 1.
根据本申请的一个或多个实施例,所述利用所述划分参数对所述样本图像进行多次划分,获得所述多个子样本图像,包括:According to one or more embodiments of the present application, using the division parameters to divide the sample image multiple times to obtain the multiple sub-sample images includes:
针对每次划分操作,在所述第一样本图像中确定划分的起始位置;For each division operation, determine the starting position of division in the first sample image;
根据所述起始位置以及所述划分参数对所述第一样本图像进行划分,获得所述多个子样本图像。Divide the first sample image according to the starting position and the division parameter to obtain the plurality of sub-sample images.
根据本申请的一个或多个实施例,所述划分参数的取值范围位于75%-95%。According to one or more embodiments of the present application, the value range of the division parameter is 75%-95%.
根据本申请的一个或多个实施例,所述方法还包括:According to one or more embodiments of the present application, the method further includes:
获取第二样本图像以及所述第二样本图像对应的标注,所述第二样本图像包括文本信息,所述标注用于反映所述文本信息;Acquiring a second sample image and an annotation corresponding to the second sample image, the second sample image includes text information, and the annotation is used to reflect the text information;
将所述第二样本图像输入所述字符识别网络模型中,获得识别结果,所述识别结果包括所述文本信息;inputting the second sample image into the character recognition network model to obtain a recognition result, the recognition result including the text information;
基于所述识别结果以及所述第二样本图像的标注确定第二损失量,并基于所述第二损失量对所述字符识别网络模型的参数进行调整,重新执行将所述第二样本图像输入字符识别网络模型以及后续训练过程,直至第二损失量小于第二阈值。Determine the second loss amount based on the recognition result and the annotation of the second sample image, adjust the parameters of the character recognition network model based on the second loss amount, and re-execute inputting the second sample image The character recognition network model and the subsequent training process until the second loss amount is less than the second threshold.
根据本申请的一个或多个实施例,所述将所述第一样本图像输入初始网络模型,获取第一特征向量集,包括:According to one or more embodiments of the present application, the inputting the first sample image into the initial network model to obtain the first feature vector set includes:
对所述第一样本图像进行数据增强处理,获得处理后的第一样本图像;performing data enhancement processing on the first sample image to obtain a processed first sample image;
将所述处理后的第一样本图像输入所述初始网络模型,获取所述第一特征向量集。Inputting the processed first sample image into the initial network model to obtain the first feature vector set.
根据本申请的一个或多个实施例,所述根据所述第一特征向量集和所述第二特征向量集确定第一损失量,包括:According to one or more embodiments of the present application, the determining the first loss amount according to the first feature vector set and the second feature vector set includes:
利用对比损失函数计算所述第一特征向量集与所述第二特征向量集之间的第一损失量。A first loss amount between the first set of feature vectors and the second set of feature vectors is calculated using a contrastive loss function.
根据本申请的一个或多个实施例,提供了一种字符识别装置,该装置可以包括:According to one or more embodiments of the present application, a character recognition device is provided, which may include:
获取单元,用于获取待处理的文本图像,所述待处理的文本图像包括待识别文本信息;an acquisition unit, configured to acquire a text image to be processed, the text image to be processed includes text information to be recognized;
处理单元,用于将所述待处理的文本图像输入字符识别网络模型中,获得输出结果,所述输出结果包括所述待识别文本信息;其中,所述字符识别网络模型是利用训练样本训练生成的,所述训练样本包括第一样本图像以及所述第一样本图像所对应的多个子样本图像,所述多个子样本图像中每个子样本图像的高度与所述第一样本图像的高度相同,所述多个子样本图像中每个子样本图像的宽度相同,且所述每个子样本图像的宽度小于所述第一样本图像的宽度,所述第一样本图像包括文本信息。A processing unit, configured to input the text image to be processed into the character recognition network model to obtain an output result, the output result including the text information to be recognized; wherein, the character recognition network model is generated by training with training samples The training sample includes a first sample image and a plurality of sub-sample images corresponding to the first sample image, and the height of each sub-sample image in the plurality of sub-sample images is the same as the height of the first sample image The heights are the same, the width of each sub-sample image in the plurality of sub-sample images is the same, and the width of each sub-sample image is smaller than the width of the first sample image, and the first sample image includes text information.
在本申请的一个或多个实施例中,所述处理单元,具体用于将所述第一样本图像、所 述多个子样本图像分别输入初始网络模型,获取第一特征向量集、第二特征向量集,所述第一特征向量集的维数与所述第二特征向量集的维数相同,所述第一特征向量集中的特征向量和所述第二特征向量集中的特征向量为所述文本信息的特征向量;根据所述第一特征向量集和所述第二特征向量集确定第一损失量,并基于所述第一损失量对所述初始网络模型的参数进行调整,重新执行将所述第一样本图像、所述多个子样本图像分别输入初始网络模型以及后续训练过程,直至第一损失量小于第一阈值,获得所述字符识别网络模型。In one or more embodiments of the present application, the processing unit is specifically configured to convert the first sample image, the The plurality of sub-sample images are respectively input into the initial network model to obtain a first feature vector set and a second feature vector set, the dimension of the first feature vector set is the same as the dimension of the second feature vector set, and the second feature vector set The feature vectors in a feature vector set and the feature vectors in the second feature vector set are feature vectors of the text information; determine a first loss amount according to the first feature vector set and the second feature vector set, and Adjust the parameters of the initial network model based on the first loss amount, and re-input the first sample image and the plurality of sub-sample images into the initial network model and the subsequent training process until the first loss amount If it is smaller than the first threshold, the character recognition network model is obtained.
在本申请的一个或多个实施例中,所述处理单元,具体用于确定划分参数,并利用所述划分参数对所述第一样本图像进行多次划分,获得所述多个子样本图像,所述多个子样本图像中每个子样本图像的宽度与所述样本图像的宽度的比值等于所述划分参数,所述划分参数大于0且小于1。In one or more embodiments of the present application, the processing unit is specifically configured to determine a division parameter, and use the division parameter to perform multiple divisions on the first sample image to obtain the plurality of sub-sample images , the ratio of the width of each sub-sample image in the plurality of sub-sample images to the width of the sample image is equal to the division parameter, and the division parameter is greater than 0 and less than 1.
在本申请的一个或多个实施例中,所述处理单元,具体用于针对每次划分操作,在所述第一样本图像中确定划分的起始位置;根据所述起始位置以及所述划分参数对所述第一样本图像进行划分,获得所述多个子样本图像。In one or more embodiments of the present application, the processing unit is specifically configured to determine a starting position of division in the first sample image for each division operation; according to the starting position and the The first sample image is divided according to the division parameter to obtain the plurality of sub-sample images.
在本申请的一个或多个实施例中,所述划分参数的取值范围位于75%-95%。In one or more embodiments of the present application, the value range of the division parameter is 75%-95%.
在本申请的一个或多个实施例中,所述获取单元,还用于获取第二样本图像以及所述第二样本图像对应的标注,所述第二样本图像包括文本信息,所述标注用于反映所述文本信息;In one or more embodiments of the present application, the acquiring unit is further configured to acquire a second sample image and an annotation corresponding to the second sample image, the second sample image includes text information, and the annotation uses to reflect the text information;
所述处理单元,还用于将所述第二样本图像输入所述字符识别网络模型中,获得识别结果,所述识别结果包括所述文本信息;基于所述识别结果以及所述第二样本图像的标注确定第二损失量,并基于所述第二损失量对所述字符识别网络模型的参数进行调整,重新执行将所述第二样本图像输入字符识别网络模型以及后续训练过程,直至第二损失量小于第二阈值。The processing unit is further configured to input the second sample image into the character recognition network model to obtain a recognition result, the recognition result including the text information; based on the recognition result and the second sample image The second loss amount is determined based on the annotation of the second loss amount, and the parameters of the character recognition network model are adjusted based on the second loss amount, and the input of the second sample image into the character recognition network model and the subsequent training process are re-executed until the second The amount of loss is less than the second threshold.
在本申请的一个或多个实施例中,所述处理单元,具体用于对所述第一样本图像进行数据增强处理,获得处理后的第一样本图像;将所述处理后的第一样本图像输入所述初始网络模型,获取所述第一特征向量集。In one or more embodiments of the present application, the processing unit is specifically configured to perform data enhancement processing on the first sample image to obtain a processed first sample image; A sample image is input into the initial network model to obtain the first set of feature vectors.
在本申请的一个或多个实施例中,所述处理单元,具体用于利用对比损失函数计算所述第一特征向量集与所述第二特征向量集之间的第一损失量。In one or more embodiments of the present application, the processing unit is specifically configured to use a comparison loss function to calculate a first loss amount between the first feature vector set and the second feature vector set.
根据本申请的一个或多个实施例,提供了一种电子设备,所述设备包括:处理器和存储器;According to one or more embodiments of the present application, an electronic device is provided, and the device includes: a processor and a memory;
所述存储器,用于存储指令或计算机程序;said memory for storing instructions or computer programs;
所述处理器,用于执行所述存储器中的所述指令或计算机程序,以使得所述电子设备执行所述的字符识别方法。The processor is configured to execute the instructions or computer programs in the memory, so that the electronic device executes the character recognition method.
根据本申请的一个或多个实施例,提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,当所述指令在设备上运行时,使得所述设备执行所述的字符识别方法。According to one or more embodiments of the present application, a computer-readable storage medium is provided. Instructions are stored in the computer-readable storage medium. When the instructions are run on a device, the device is made to execute the character recognition method.
需要说明的是,本说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似部分互相参见即可。对于实施例 公开的系统或装置而言,由于其与实施例公开的方法相对应,所以描述的比较简单,相关之处参见方法部分说明即可。It should be noted that each embodiment in this specification is described in a progressive manner, each embodiment focuses on the differences from other embodiments, and the same and similar parts of each embodiment can be referred to each other. For example As for the disclosed system or device, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and for the related part, please refer to the description of the method part.
应当理解,在本申请中,“至少一个(项)”是指一个或者多个,“多个”是指两个或两个以上。“和/或”,用于描述关联对象的关联关系,表示可以存在三种关系,例如,“A和/或B”可以表示:只存在A,只存在B以及同时存在A和B三种情况,其中A,B可以是单数或者复数。字符“/”一般表示前后关联对象是一种“或”的关系。“以下至少一项(个)”或其类似表达,是指这些项中的任意组合,包括单项(个)或复数项(个)的任意组合。例如,a,b或c中的至少一项(个),可以表示:a,b,c,“a和b”,“a和c”,“b和c”,或“a和b和c”,其中a,b,c可以是单个,也可以是多个。It should be understood that in this application, "at least one (item)" means one or more, and "multiple" means two or more. "And/or" is used to describe the association relationship of associated objects, indicating that there can be three types of relationships, for example, "A and/or B" can mean: only A exists, only B exists, and A and B exist at the same time , where A and B can be singular or plural. The character "/" generally indicates that the contextual objects are an "or" relationship. "At least one of the following" or similar expressions refer to any combination of these items, including any combination of single or plural items. For example, at least one item (piece) of a, b or c can mean: a, b, c, "a and b", "a and c", "b and c", or "a and b and c ", where a, b, c can be single or multiple.
还需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。It should also be noted that in this article, relational terms such as first and second etc. are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply that these entities or operations Any such actual relationship or order exists between. Furthermore, the term "comprises", "comprises" or any other variation thereof is intended to cover a non-exclusive inclusion such that a process, method, article, or apparatus comprising a set of elements includes not only those elements, but also includes elements not expressly listed. other elements of or also include elements inherent in such a process, method, article, or device. Without further limitations, an element defined by the phrase "comprising a ..." does not exclude the presence of additional identical elements in the process, method, article or apparatus comprising said element.
结合本文中所公开的实施例描述的方法或算法的步骤可以直接用硬件、处理器执行的软件模块,或者二者的结合来实施。软件模块可以置于随机存储器(RAM)、内存、只读存储器(ROM)、电可编程ROM、电可擦除可编程ROM、寄存器、硬盘、可移动磁盘、CD-ROM、或技术领域内所公知的任意其它形式的存储介质中。The steps of the methods or algorithms described in connection with the embodiments disclosed herein may be directly implemented by hardware, software modules executed by a processor, or a combination of both. Software modules can be placed in random access memory (RAM), internal memory, read-only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, removable disk, CD-ROM, or any other Any other known storage medium.
对所公开的实施例的上述说明,使本领域专业技术人员能够实现或使用本申请。对这些实施例的多种修改对本领域的专业技术人员来说将是显而易见的,本文中所定义的一般原理可以在不脱离本申请的精神或范围的情况下,在其它实施例中实现。因此,本申请将不会被限制于本文所示的这些实施例,而是要符合与本文所公开的原理和新颖特点相一致的最宽的范围。 The above description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be implemented in other embodiments without departing from the spirit or scope of the application. Therefore, the present application will not be limited to the embodiments shown herein, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (12)

  1. 一种字符识别方法,其特征在于,所述方法包括:A character recognition method, characterized in that the method comprises:
    获取待处理的文本图像,所述待处理的文本图像包括待识别文本信息;Acquiring a text image to be processed, the text image to be processed includes text information to be recognized;
    将所述待处理的文本图像输入字符识别网络模型中,获得输出结果,所述输出结果包括所述待识别文本信息;Inputting the text image to be processed into the character recognition network model to obtain an output result, the output result including the text information to be recognized;
    其中,所述字符识别网络模型是利用训练样本训练生成的,所述训练样本包括第一样本图像以及所述第一样本图像所对应的多个子样本图像,所述多个子样本图像中每个子样本图像的高度与所述第一样本图像的高度相同,所述多个子样本图像中每个子样本图像的宽度相同,且所述每个子样本图像的宽度小于所述第一样本图像的宽度,所述第一样本图像包括文本信息。Wherein, the character recognition network model is generated by using training samples, the training samples include a first sample image and a plurality of sub-sample images corresponding to the first sample image, each of the plurality of sub-sample images The height of each subsample image is the same as the height of the first sample image, the width of each subsample image in the plurality of subsample images is the same, and the width of each subsample image is smaller than that of the first sample image Width, the first sample image includes text information.
  2. 根据权利要求1所述的方法,其特征在于,所述字符识别网络模型的训练过程包括:The method according to claim 1, wherein the training process of the character recognition network model comprises:
    将所述第一样本图像、所述多个子样本图像分别输入初始网络模型,获取第一特征向量集、第二特征向量集,所述第一特征向量集的维数与所述第二特征向量集的维数相同,所述第一特征向量集中的特征向量和所述第二特征向量集中的特征向量为所述文本信息的特征向量;Input the first sample image and the plurality of sub-sample images into the initial network model respectively, and obtain the first feature vector set and the second feature vector set, the dimension of the first feature vector set and the second feature The dimensions of the vector sets are the same, and the feature vectors in the first feature vector set and the feature vectors in the second feature vector set are feature vectors of the text information;
    根据所述第一特征向量集和所述第二特征向量集确定第一损失量,并基于所述第一损失量对所述初始网络模型的参数进行调整,重新执行将所述第一样本图像、所述多个子样本图像分别输入初始网络模型以及后续训练过程,直至第一损失量小于第一阈值,获得所述字符识别网络模型。Determine the first loss amount according to the first feature vector set and the second feature vector set, and adjust the parameters of the initial network model based on the first loss amount, and re-execute the first sample The image and the plurality of sub-sample images are respectively input into the initial network model and the subsequent training process until the first loss amount is less than a first threshold, and the character recognition network model is obtained.
  3. 根据权利要求2所述的方法,其特征在于,所述多个子样本图像的获取过程包括:The method according to claim 2, wherein the acquiring process of the plurality of sub-sample images comprises:
    确定划分参数,并利用所述划分参数对所述第一样本图像进行多次划分,获得所述多个子样本图像,所述多个子样本图像中每个子样本图像的宽度与所述样本图像的宽度的比值等于所述划分参数,所述划分参数大于0且小于1。determining the division parameters, and using the division parameters to divide the first sample image multiple times to obtain the multiple sub-sample images, the width of each sub-sample image in the multiple sub-sample images is the same as the width of the sample image The width ratio is equal to the division parameter, and the division parameter is greater than 0 and less than 1.
  4. 根据权利要求3所述的方法,其特征在于,所述利用所述划分参数对所述样本图像进行多次划分,获得所述多个子样本图像,包括:The method according to claim 3, wherein the step of dividing the sample image multiple times by using the division parameters to obtain the plurality of sub-sample images comprises:
    针对每次划分操作,在所述第一样本图像中确定划分的起始位置;For each division operation, determine the starting position of division in the first sample image;
    根据所述起始位置以及所述划分参数对所述第一样本图像进行划分,获得所述多个子样本图像。Divide the first sample image according to the starting position and the division parameter to obtain the plurality of sub-sample images.
  5. 根据权利要求3或4所述的方法,其特征在于,所述划分参数的取值范围位于75%-95%。The method according to claim 3 or 4, characterized in that the value range of the division parameter is 75%-95%.
  6. 根据权利要求2所述的方法,其特征在于,所述方法还包括:The method according to claim 2, further comprising:
    获取第二样本图像以及所述第二样本图像对应的标注,所述第二样本图像包括文本信息,所述标注用于反映所述文本信息;Acquiring a second sample image and an annotation corresponding to the second sample image, the second sample image includes text information, and the annotation is used to reflect the text information;
    将所述第二样本图像输入所述字符识别网络模型中,获得识别结果,所述识别结果包括所述文本信息;inputting the second sample image into the character recognition network model to obtain a recognition result, the recognition result including the text information;
    基于所述识别结果以及所述第二样本图像的标注确定第二损失量,并基于所述第二损失量对所述字符识别网络模型的参数进行调整,重新执行将所述第二样本图像输入字符识 别网络模型以及后续训练过程,直至第二损失量小于第二阈值。Determine the second loss amount based on the recognition result and the annotation of the second sample image, adjust the parameters of the character recognition network model based on the second loss amount, and re-execute inputting the second sample image character recognition Differentiate the network model and the subsequent training process until the second loss amount is less than the second threshold.
  7. 根据权利要求2所述的方法,其特征在于,所述将所述第一样本图像输入初始网络模型,获取第一特征向量集,包括:The method according to claim 2, wherein said inputting said first sample image into an initial network model to obtain a first set of feature vectors comprises:
    对所述第一样本图像进行数据增强处理,获得处理后的第一样本图像;performing data enhancement processing on the first sample image to obtain a processed first sample image;
    将所述处理后的第一样本图像输入所述初始网络模型,获取所述第一特征向量集。Inputting the processed first sample image into the initial network model to obtain the first feature vector set.
  8. 根据权利要求2所述的方法,其特征在于,所述根据所述第一特征向量集和所述第二特征向量集确定第一损失量,包括:The method according to claim 2, wherein the determining the first loss amount according to the first feature vector set and the second feature vector set comprises:
    利用对比损失函数计算所述第一特征向量集与所述第二特征向量集之间的第一损失量。A first loss amount between the first set of feature vectors and the second set of feature vectors is calculated using a contrastive loss function.
  9. 一种字符识别装置,其特征在于,所述装置包括:A character recognition device, characterized in that the device comprises:
    获取单元,用于获取待处理的文本图像,所述待处理的文本图像包括待识别文本信息;an acquisition unit, configured to acquire a text image to be processed, the text image to be processed includes text information to be recognized;
    处理单元,用于将所述待处理的文本图像输入字符识别网络模型中,获得输出结果,所述输出结果包括所述待识别文本信息;其中,所述字符识别网络模型是利用训练样本训练生成的,所述训练样本包括第一样本图像以及所述第一样本图像所对应的多个子样本图像,所述多个子样本图像中每个子样本图像的高度与所述第一样本图像的高度相同,所述多个子样本图像中每个子样本图像的宽度相同,且所述每个子样本图像的宽度小于所述第一样本图像的宽度,所述第一样本图像包括文本信息。A processing unit, configured to input the text image to be processed into the character recognition network model to obtain an output result, the output result including the text information to be recognized; wherein, the character recognition network model is generated by training with training samples The training sample includes a first sample image and a plurality of sub-sample images corresponding to the first sample image, and the height of each sub-sample image in the plurality of sub-sample images is the same as the height of the first sample image The heights are the same, the width of each sub-sample image in the plurality of sub-sample images is the same, and the width of each sub-sample image is smaller than the width of the first sample image, and the first sample image includes text information.
  10. 一种电子设备,其特征在于,所述设备包括:处理器和存储器;An electronic device, characterized in that the device includes: a processor and a memory;
    所述存储器,用于存储指令或计算机程序;said memory for storing instructions or computer programs;
    所述处理器,用于执行所述存储器中的所述指令或计算机程序,以使得所述电子设备执行权利要求1-8任一项所述的字符识别方法。The processor is configured to execute the instructions or computer programs in the memory, so that the electronic device executes the character recognition method according to any one of claims 1-8.
  11. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有指令,当所述指令在设备上运行时,使得所述设备执行权利要求1-8任一项所述的字符识别方法。A computer-readable storage medium, characterized in that instructions are stored in the computer-readable storage medium, and when the instructions are run on a device, the device executes the method described in any one of claims 1-8. character recognition method.
  12. 一种计算机程序产品,其特征在于,当所述计算机程序产品在计算机上运行时,使得计算机执行权利要求1-8任一项所述的字符识别方法。 A computer program product, characterized in that, when the computer program product is run on a computer, the computer is made to execute the character recognition method according to any one of claims 1-8.
PCT/CN2023/072001 2022-01-30 2023-01-13 Character recognition method and apparatus, device, and medium WO2023143107A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210114334.6 2022-01-30
CN202210114334.6A CN114445812A (en) 2022-01-30 2022-01-30 Character recognition method, device, equipment and medium

Publications (1)

Publication Number Publication Date
WO2023143107A1 true WO2023143107A1 (en) 2023-08-03

Family

ID=81370879

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/072001 WO2023143107A1 (en) 2022-01-30 2023-01-13 Character recognition method and apparatus, device, and medium

Country Status (2)

Country Link
CN (1) CN114445812A (en)
WO (1) WO2023143107A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114445812A (en) * 2022-01-30 2022-05-06 北京有竹居网络技术有限公司 Character recognition method, device, equipment and medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106803090A (en) * 2016-12-05 2017-06-06 中国银联股份有限公司 A kind of image-recognizing method and device
CN108875722A (en) * 2017-12-27 2018-11-23 北京旷视科技有限公司 Character recognition and identification model training method, device and system and storage medium
CN111695385A (en) * 2019-03-15 2020-09-22 杭州海康威视数字技术股份有限公司 Text recognition method, device and equipment
WO2021081562A2 (en) * 2021-01-20 2021-04-29 Innopeak Technology, Inc. Multi-head text recognition model for multi-lingual optical character recognition
CN113111871A (en) * 2021-04-21 2021-07-13 北京金山数字娱乐科技有限公司 Training method and device of text recognition model and text recognition method and device
CN113887442A (en) * 2021-09-29 2022-01-04 招商银行股份有限公司 OCR training data generation method, device, equipment and medium
CN114445812A (en) * 2022-01-30 2022-05-06 北京有竹居网络技术有限公司 Character recognition method, device, equipment and medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106803090A (en) * 2016-12-05 2017-06-06 中国银联股份有限公司 A kind of image-recognizing method and device
CN108875722A (en) * 2017-12-27 2018-11-23 北京旷视科技有限公司 Character recognition and identification model training method, device and system and storage medium
CN111695385A (en) * 2019-03-15 2020-09-22 杭州海康威视数字技术股份有限公司 Text recognition method, device and equipment
WO2021081562A2 (en) * 2021-01-20 2021-04-29 Innopeak Technology, Inc. Multi-head text recognition model for multi-lingual optical character recognition
CN113111871A (en) * 2021-04-21 2021-07-13 北京金山数字娱乐科技有限公司 Training method and device of text recognition model and text recognition method and device
CN113887442A (en) * 2021-09-29 2022-01-04 招商银行股份有限公司 OCR training data generation method, device, equipment and medium
CN114445812A (en) * 2022-01-30 2022-05-06 北京有竹居网络技术有限公司 Character recognition method, device, equipment and medium

Also Published As

Publication number Publication date
CN114445812A (en) 2022-05-06

Similar Documents

Publication Publication Date Title
US20230394671A1 (en) Image segmentation method and apparatus, and device, and storage medium
WO2023083142A1 (en) Sentence segmentation method and apparatus, storage medium, and electronic device
CN110826567B (en) Optical character recognition method, device, equipment and storage medium
WO2022247562A1 (en) Multi-modal data retrieval method and apparatus, and medium and electronic device
WO2022037419A1 (en) Audio content recognition method and apparatus, and device and computer-readable medium
WO2023138314A1 (en) Object attribute recognition method and apparatus, readable storage medium, and electronic device
CN112883968B (en) Image character recognition method, device, medium and electronic equipment
WO2023093361A1 (en) Image character recognition model training method, and image character recognition method and apparatus
CN112364829B (en) Face recognition method, device, equipment and storage medium
CN113378586B (en) Speech translation method, translation model training method, device, medium, and apparatus
WO2022111347A1 (en) Information processing method and apparatus, electronic device, and storage medium
WO2023029904A1 (en) Text content matching method and apparatus, electronic device, and storage medium
WO2023142914A1 (en) Date recognition method and apparatus, readable medium and electronic device
WO2023143107A1 (en) Character recognition method and apparatus, device, and medium
WO2023142913A1 (en) Video processing method and apparatus, readable medium and electronic device
WO2023005729A1 (en) Speech information processing method and apparatus, and electronic device
CN114494709A (en) Feature extraction model generation method, image feature extraction method and device
CN111128131B (en) Voice recognition method and device, electronic equipment and computer readable storage medium
WO2023130925A1 (en) Font recognition method and apparatus, readable medium, and electronic device
CN111312224B (en) Training method and device of voice segmentation model and electronic equipment
WO2023134433A1 (en) Font generation method and apparatus, and device
WO2023000782A1 (en) Method and apparatus for acquiring video hotspot, readable medium, and electronic device
WO2023065895A1 (en) Text recognition method and apparatus, readable medium, and electronic device
CN112837672A (en) Method and device for determining conversation affiliation, electronic equipment and storage medium
US20240096347A1 (en) Method and apparatus for determining speech similarity, and program product

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23745990

Country of ref document: EP

Kind code of ref document: A1