CN118053169A

CN118053169A - Bill input method and device, storage medium and electronic equipment

Info

Publication number: CN118053169A
Application number: CN202410189805.9A
Authority: CN
Inventors: 刘泽华
Original assignee: Industrial and Commercial Bank of China Ltd ICBC
Current assignee: Industrial and Commercial Bank of China Ltd ICBC
Priority date: 2024-02-20
Filing date: 2024-02-20
Publication date: 2024-05-17

Abstract

The application discloses a bill input method and device, a storage medium and electronic equipment, and relates to the technical fields of artificial intelligence, financial science and technology and other related technical fields, wherein the method comprises the following steps: acquiring a target image corresponding to a target bill to be input; performing field position identification on the target image through a target detection model to obtain a target field and position information corresponding to the target field; performing character recognition based on the position information and the target image through a target recognition model to obtain character information corresponding to the target field; and inputting the text information corresponding to the target field into a target bill system. The application solves the problem of lower accuracy of bill input caused by directly carrying out text recognition on the bill image through the deep learning model in the related technology.

Description

Bill input method and device, storage medium and electronic equipment

Technical Field

The application relates to the technical fields of artificial intelligence, financial science and technology and other related technical fields, in particular to a bill input method and device, a storage medium and electronic equipment.

Background

Financial institution documents refer to documents issued by a financial institution or free from financial institutions to afford payment obligations. Due to the wide variety of financial institution businesses, a large number of financial institution notes need to be processed.

In the prior art, in the business handling process, a financial institution generally inputs client bill information such as checks, promissory notes and draft notes in a system by manpower, or directly performs text recognition on bill images by a deep learning model, and the problem that the accuracy of bill input is lower exists in text recognition on the bill images by the deep learning model in a manual mode, which consumes a great deal of cost.

Aiming at the problem that text recognition is directly carried out on bill images through a deep learning model in the related technology, so that the accuracy of bill input is lower, no effective solution is proposed at present.

Disclosure of Invention

The application mainly aims to provide a bill input method and device, a storage medium and electronic equipment, and aims to solve the problem that in the related art, text recognition is directly carried out on bill images through a deep learning model, so that the bill input accuracy is low.

In order to achieve the above object, according to one aspect of the present application, there is provided a ticket entry method. The method comprises the following steps: acquiring a target image corresponding to a target bill to be input; performing field position identification on the target image through a target detection model to obtain a target field and position information corresponding to the target field; performing character recognition based on the position information and the target image through a target recognition model to obtain character information corresponding to the target field; and inputting the text information corresponding to the target field into a target bill system.

Further, performing field position identification on the target image through a target detection model, and obtaining the target field and the position information corresponding to the target field includes: performing up-sampling processing on the target image through the target detection model to obtain a feature map; performing convolution calculation on the feature map through the target detection model to obtain a thermodynamic diagram; and carrying out convolution calculation on the thermodynamic diagram through the target detection model to obtain the target field and the position information corresponding to the target field.

Further, performing text recognition based on the position information and the target image through a target recognition model, and obtaining text information corresponding to the target field includes: cutting the target image according to the target field and the position information corresponding to the target field to obtain an image fragment corresponding to the target field; and carrying out character recognition on the image fragments corresponding to the target fields through the target recognition model to obtain character information corresponding to the target fields.

Further, performing text recognition on the image segment corresponding to the target field through the target recognition model, and obtaining text information corresponding to the target field includes: performing convolution calculation on the image segments corresponding to the target fields through a convolution layer in the target recognition model to obtain a feature sequence; processing the characteristic sequence through a circulating layer in the target recognition model to obtain label distribution corresponding to each characteristic component in the characteristic sequence; and determining the text information corresponding to the target field based on the label distribution corresponding to each characteristic component in the characteristic sequence through a transcription layer in the target recognition model.

Further, before field position identification is performed on the target image through the target detection model to obtain the target field and the position information corresponding to the target field, the method further includes: acquiring an image corresponding to a history bill and real tag information in the image corresponding to the history bill, wherein the real tag information is a target field in the image corresponding to the history bill and position information corresponding to the target field; constructing a target training set according to the image corresponding to the history bill and the real label information; training the initial detection model according to the target training set to obtain the target detection model.

Further, training the initial detection model according to the target training set, and obtaining the target detection model includes: processing the target training set through the initial detection model to obtain predictive label information; obtaining a target loss function according to the predicted tag information and the real tag information, wherein the target loss function at least comprises a root mean square error function and an average absolute error function; and training the initial detection model according to the target loss function to obtain the target detection model.

Further, after entering the text information corresponding to the target field into the target bill system, the method further comprises: if a correction instruction of the text information corresponding to the target field is detected, correcting the text information according to the correction instruction to obtain corrected text information; determining a model to be iterated according to the corrected text information, wherein the model to be iterated is the target detection model or the target identification model; and carrying out iterative updating on the model to be iterated according to the corrected text information.

In order to achieve the above object, according to another aspect of the present application, there is provided a ticket input device. The device comprises: the first acquisition unit is used for acquiring a target image corresponding to a target bill to be input; the first identification unit is used for carrying out field position identification on the target image through a target detection model to obtain a target field and position information corresponding to the target field; the second recognition unit is used for performing character recognition based on the position information and the target image through a target recognition model to obtain character information corresponding to the target field; and the input unit is used for inputting the text information corresponding to the target field into a target bill system.

Further, the first identifying unit includes: the sampling module is used for carrying out up-sampling processing on the target image through the target detection model to obtain a feature map; the first calculation module is used for carrying out convolution calculation on the feature map through the target detection model to obtain a thermodynamic diagram; and the second calculation module is used for carrying out convolution calculation on the thermodynamic diagram through the target detection model to obtain the target field and the position information corresponding to the target field.

Further, the second identifying unit includes: the cutting module is used for cutting the target image according to the target field and the position information corresponding to the target field to obtain an image fragment corresponding to the target field; and the identification module is used for carrying out character identification on the image fragments corresponding to the target fields through the target identification model to obtain character information corresponding to the target fields.

Further, the identification module includes: the computing sub-module is used for carrying out convolution computation on the image segments corresponding to the target fields through the convolution layers in the target recognition model to obtain a feature sequence; the processing sub-module is used for processing the characteristic sequence through a circulating layer in the target recognition model to obtain label distribution corresponding to each characteristic component in the characteristic sequence; and the determining submodule is used for determining the text information corresponding to the target field based on the label distribution corresponding to each characteristic component in the characteristic sequence through a transcription layer in the target recognition model.

Further, the apparatus further comprises: the second acquisition unit is used for carrying out field position identification on the target image through a target detection model, and acquiring an image corresponding to a history bill and real tag information in the image corresponding to the history bill before obtaining a target field and position information corresponding to the target field, wherein the real tag information is the target field in the image corresponding to the history bill and the position information corresponding to the target field; the construction unit is used for constructing a target training set according to the image corresponding to the history bill and the real label information; and the training unit is used for training the initial detection model according to the target training set to obtain the target detection model.

Further, the training unit includes: the processing module is used for processing the target training set through the initial detection model to obtain prediction tag information; the determining module is used for obtaining a target loss function according to the predicted tag information and the real tag information, wherein the target loss function at least comprises a root mean square error function and an average absolute error function; and the training module is used for training the initial detection model according to the target loss function to obtain the target detection model.

Further, the apparatus further comprises: the correction unit is used for correcting the text information according to the correction instruction if the correction instruction of the text information corresponding to the target field is detected after the text information corresponding to the target field is recorded into the target bill system, so as to obtain corrected text information; the determining unit is used for determining a model to be iterated according to the corrected text information, wherein the model to be iterated is the target detection model or the target identification model; and the updating unit is used for carrying out iterative updating on the model to be iterated according to the corrected text information.

In order to achieve the above object, according to an aspect of the present application, there is provided a computer-readable storage medium storing a program, wherein the program, when run, controls a device in which the storage medium is located to execute the method of entering a ticket of any one of the above.

In order to achieve the above object, according to another aspect of the present application, there is also provided an electronic device, including one or more processors and a memory, where the memory is configured to store one or more processors to implement the method for entering notes according to any one of the above.

According to the application, the following steps are adopted: acquiring a target image corresponding to a target bill to be input; performing field position identification on the target image through the target detection model to obtain a target field and position information corresponding to the target field; performing character recognition based on the position information and the target image through the target recognition model to obtain character information corresponding to the target field; the text information corresponding to the target field is input into the target bill system, so that the problem that the bill input accuracy is low due to the fact that text recognition is directly carried out on bill images through a deep learning model in the related technology is solved. In the scheme, the target image needing to be recorded with the bill information is firstly obtained, the target detection model is utilized to carry out field position recognition on the target image to obtain the position information corresponding to the target field, and then the text recognition is carried out through the position information of the target field, so that the accuracy of the text recognition can be effectively improved, the whole target image is prevented from being processed by the target recognition model, and the effect of improving the bill recording efficiency is further achieved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application. In the drawings:

FIG. 1 is a flow chart of a method of entering notes provided in accordance with an embodiment of the application;

FIG. 2 is a flow chart of an alternative ticket entry method provided in accordance with an embodiment of the present application;

FIG. 3 is a schematic diagram of a ticket entry device provided in accordance with an embodiment of the present application;

Fig. 4 is a schematic diagram of an electronic device according to an embodiment of the present application.

Detailed Description

It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other. The application will be described in detail below with reference to the drawings in connection with embodiments.

In order that those skilled in the art will better understand the present application, a technical solution in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present application without making any inventive effort, shall fall within the scope of the present application.

It should be noted that the terms "first," "second," and the like in the description and the claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate in order to describe the embodiments of the application herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

It should be noted that, related information (including, but not limited to, user equipment information, user personal information, etc.) and data (including, but not limited to, data for presentation, analyzed data, etc.) related to the present disclosure are information and data authorized by a user or sufficiently authorized by each party. For example, an interface is provided between the system and the relevant user or institution, before acquiring the relevant information, the system needs to send an acquisition request to the user or institution through the interface, and acquire the relevant information after receiving the consent information fed back by the user or institution.

The application will be described with reference to preferred implementation steps, and fig. 1 is a flowchart of a method for entering notes provided according to an embodiment of the application, as shown in fig. 1, the method includes the steps of:

step S101, obtaining a target image corresponding to a target bill to be recorded.

Optionally, determining a target bill needing to be recorded with bill information, and then acquiring a target image corresponding to the target bill. In an alternative embodiment, the image information of the target bill may be acquired by the image acquisition device, and the acquired image information is entered into the corresponding bill system. The target bill may be a ticket, a check, an application form, or the like.

Step S102, field position identification is carried out on the target image through the target detection model, and the target field and the position information corresponding to the target field are obtained.

Optionally, after the target image is obtained, inputting the target image into a target detection model, and performing field position identification on the target image through the target detection model, namely identifying the fields needing to be input into a system, such as contents of check account numbers, currencies, amounts, certificate numbers, abstracts, service numbers and the like, and recording the position information corresponding to the target fields and the target fields.

Step S103, performing character recognition based on the position information and the target image through the target recognition model to obtain character information corresponding to the target field.

Optionally, after the position information of the field is obtained, text recognition is performed on the target image based on the position information through the target recognition model, so that text information corresponding to the target field is obtained.

Step S104, inputting the text information corresponding to the target field into the target bill system.

Optionally, the field content identified by the target identification model is input into the target bill system, and it should be noted that, in order to further improve accuracy of bill information, after obtaining the text information output by the target identification model, the text information may be checked and modified.

In summary, the target image in which the bill information is required to be recorded is acquired first, the target detection model is utilized to perform field position recognition on the target image, so that the position information corresponding to the target field is obtained, and the text recognition is performed again through the position information of the target field, so that the accuracy of the text recognition can be effectively improved, the whole target image is prevented from being processed by the target recognition model, and the effect of improving the bill recording efficiency is further achieved.

Optionally, in the method for recording a ticket provided by the embodiment of the present application, performing field location identification on a target image by using a target detection model, where obtaining a target field and location information corresponding to the target field includes: performing up-sampling treatment on the target image through the target detection model to obtain a feature map; carrying out convolution calculation on the feature map through a target detection model to obtain a thermodynamic diagram; and carrying out convolution calculation on the thermodynamic diagram through the target detection model to obtain the target field and the position information corresponding to the target field.

In an alternative embodiment, field location identification of the target image by the target detection model includes the steps of: and the target detection model carries out up-sampling processing on the input target image through three deconvolutions to obtain a high-resolution characteristic diagram. The thermodynamic diagram is obtained by carrying out convolution calculation on the characteristic diagram through the target detection model, and the thermodynamic diagram is used for representing whether each thermodynamic point has the target field and the type of the target field.

After the thermodynamic diagram is obtained, the thermodynamic diagram is subjected to convolution calculation through the target detection model, and then the target field and the position information corresponding to the target field are obtained.

The target image is convolved through the target detection model, so that the position information of the target field can be accurately identified.

Optionally, in the method for inputting a ticket provided by the embodiment of the present application, performing text recognition based on the location information and the target image through the target recognition model, and obtaining text information corresponding to the target field includes: cutting the target image according to the target field and the position information corresponding to the target field to obtain an image fragment corresponding to the target field; and carrying out character recognition on the image fragments corresponding to the target fields through the target recognition model to obtain character information corresponding to the target fields.

In an alternative embodiment, after the above-mentioned target field and the position information corresponding to the target field are obtained, the target image is cut according to the target field and the position information corresponding to the target field, and is cut into a plurality of image segments, where each image segment corresponds to one target field to be recorded. And inputting the image fragments into a target recognition model, and performing character recognition on the image fragments corresponding to the target fields through the target recognition model to obtain character information corresponding to the target fields.

The image is segmented through the position information, and then the image fragments are identified in a targeted mode through the target identification model, so that the accuracy of bill identification can be effectively improved.

Optionally, in the method for inputting a ticket provided by the embodiment of the present application, performing text recognition on an image segment corresponding to a target field through a target recognition model, and obtaining text information corresponding to the target field includes: carrying out convolution calculation on the image segments corresponding to the target fields through a convolution layer in the target recognition model to obtain a feature sequence; processing the feature sequence through a circulating layer in the target recognition model to obtain label distribution corresponding to each feature component in the feature sequence; and determining the text information corresponding to the target field based on the label distribution corresponding to each feature component in the feature sequence through a transcription layer in the target recognition model.

In an alternative example, the object recognition model implements text recognition of image segments using the steps of: and performing convolution calculation on the image fragments corresponding to the target fields through a convolution layer in the target recognition model, converting the image fragments into feature sequences with feature information, and then processing the feature sequences through a circulation layer in the target recognition model to obtain label distribution corresponding to each feature component in the feature sequences. It should be noted that the loop layer may be composed of a bidirectional recurrent neural network. Finally, determining the text information corresponding to the target field based on the label distribution corresponding to each feature component in the feature sequence by the transcription layer, in short, selecting the label with the highest probability from each feature component as the final label of the feature component by the transcription layer, and determining the symbol information corresponding to the feature component, namely the text information according to the final label.

The context information can be accurately grasped through the convolution layer, the circulation layer and the transcription layer in the target recognition model, so that the text information of the image fragment can be accurately recognized.

Optionally, in the method for recording a ticket provided by the embodiment of the present application, before field location identification is performed on the target image by using the target detection model to obtain the target field and location information corresponding to the target field, the method further includes: acquiring an image corresponding to the history bill and real tag information in the image corresponding to the history bill, wherein the real tag information is a target field in the image corresponding to the history bill and position information corresponding to the target field; constructing a target training set according to the image corresponding to the history bill and the real label information; and training the initial detection model according to the target training set to obtain a target detection model.

In an alternative embodiment, training the initial detection model according to the target training set, and obtaining the target detection model includes: processing the target training set through an initial detection model to obtain predictive label information; obtaining a target loss function according to the predicted tag information and the real tag information, wherein the target loss function at least comprises a root mean square error function and an average absolute error function; training the initial detection model according to the target loss function to obtain a target detection model.

In an alternative embodiment, the target detection model is trained using the following steps: and acquiring images corresponding to the historical notes and real tag information in the images corresponding to the historical notes. The real tag information is a target field in an image corresponding to the history bill and position information corresponding to the target field. And then constructing a target training set through images corresponding to the historical bill and real label information so as to train the initial detection model through the target training set to obtain a target detection model.

In an alternative embodiment, training the initial detection model with the target training set includes: inputting the target training set into an initial detection model, processing the target training set through the initial detection model to obtain predicted tag information, calculating root mean square error and average absolute error through the predicted tag information and real tag information to obtain a target loss function, and performing iterative training on the initial detection model through the target loss function to obtain the target detection model.

In an alternative embodiment, the target detection model and the target recognition model may also be jointly trained, that is, after the initial detection model is obtained and the predicted tag information is output, the image in the target training set is cut according to the predicted tag information to obtain a cut image, the cut image and the corresponding real word information are input into the initial recognition model to obtain predicted word information, and finally the initial recognition model is trained through the predicted word information and the real word information to obtain the target recognition model.

Through the training process, the detection performance of the detection model can be effectively improved, and the accuracy of the subsequent character recognition is further improved.

Optionally, in the method for recording notes provided by the embodiment of the present application, after recording text information corresponding to a target field into a target note system, the method further includes: if a correction instruction of the text information corresponding to the target field is detected, correcting the text information according to the correction instruction to obtain corrected text information; determining a model to be iterated according to the corrected text information, wherein the model to be iterated is a target detection model or a target identification model; and carrying out iterative updating on the model to be iterated according to the corrected text information.

In an alternative embodiment, after obtaining the text information output by the target recognition model, checking and verifying the text information, if an error exists, correcting the text information corresponding to the target field, if a correction instruction for the text information corresponding to the target field is detected, correcting the text information, and inputting the corrected text information into the bill system.

In order to further improve the accuracy of bill information, a model to be iterated can be determined according to the corrected text information, namely whether the text is required to be corrected due to inaccurate position detection or inaccurate text recognition is determined, and further, which model is required to be iterated and updated is determined, and after the model which is required to be iterated and updated is determined, the model to be iterated and updated is iterated and updated through the corrected text information, so that the accuracy of the bill information is improved subsequently.

In an alternative embodiment, the entering of ticket information may be implemented using a flowchart as shown in fig. 2: (1) The teller finishes the recording of bill images, which can be contents such as a ticket, a check, an application form and the like; (2) The image is identified and positioned by using a positioning model, the fields needing to be input into the system, such as check account numbers, currencies, amounts, certificate numbers, abstracts, service numbers and the like are identified, and the field types and the position coordinates of the fields are recorded; (3) Cutting the image into different fragments according to the field type and the position coordinates recorded by the positioning model, wherein each fragment is a field to be recorded; (4) Inputting fragments into a text recognition model, and recognizing specific text contents of fragment fields, such as contents of check account numbers, currencies, amounts, voucher numbers, abstracts, service numbers and the like, by the text recognition model; (5) Inputting field content identified by the text identification model into a system; (6) checking and modifying the input result.

According to the bill input method provided by the embodiment of the application, the target image corresponding to the target bill to be input is obtained; performing field position identification on the target image through the target detection model to obtain a target field and position information corresponding to the target field; performing character recognition based on the position information and the target image through the target recognition model to obtain character information corresponding to the target field; the text information corresponding to the target field is input into the target bill system, so that the problem that the bill input accuracy is low due to the fact that text recognition is directly carried out on bill images through a deep learning model in the related technology is solved. In the scheme, the target image needing to be recorded with the bill information is firstly obtained, the target detection model is utilized to carry out field position recognition on the target image to obtain the position information corresponding to the target field, and then the text recognition is carried out through the position information of the target field, so that the accuracy of the text recognition can be effectively improved, the whole target image is prevented from being processed by the target recognition model, and the effect of improving the bill recording efficiency is further achieved.

It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer executable instructions, and that although a logical order is illustrated in the flowcharts, in some cases the steps illustrated or described may be performed in an order other than that illustrated herein.

The embodiment of the application also provides a bill recording device, and the bill recording device can be used for executing the bill recording method. The bill input device provided by the embodiment of the application is described below.

Fig. 3 is a schematic view of a ticket entry device according to an embodiment of the application. As shown in fig. 3, the apparatus includes: a first acquisition unit 301, a first identification unit 302, a second identification unit 303 and an entry unit 304.

A first obtaining unit 301, configured to obtain a target image corresponding to a target ticket to be entered;

The first identifying unit 302 is configured to identify a field position of the target image through the target detection model, so as to obtain a target field and position information corresponding to the target field;

a second recognition unit 303, configured to perform text recognition based on the location information and the target image through the target recognition model, so as to obtain text information corresponding to the target field;

and the input unit 304 is used for inputting the text information corresponding to the target field into the target bill system.

According to the bill input device provided by the embodiment of the application, the first acquisition unit 301 is used for acquiring the target image corresponding to the target bill to be input; the first recognition unit 302 performs field position recognition on the target image through the target detection model to obtain a target field and position information corresponding to the target field; the second recognition unit 303 performs text recognition based on the position information and the target image through the target recognition model to obtain text information corresponding to the target field; the input unit 304 inputs the text information corresponding to the target field into the target bill system, so that the problem that the accuracy of bill input is lower due to the fact that text recognition is directly carried out on bill images through a deep learning model in the related technology is solved. In the scheme, the target image needing to be recorded with the bill information is firstly obtained, the target detection model is utilized to carry out field position recognition on the target image to obtain the position information corresponding to the target field, and then the text recognition is carried out through the position information of the target field, so that the accuracy of the text recognition can be effectively improved, the whole target image is prevented from being processed by the target recognition model, and the effect of improving the bill recording efficiency is further achieved.

Optionally, in the ticket recording device provided by the embodiment of the present application, the first identifying unit includes: the sampling module is used for carrying out up-sampling processing on the target image through the target detection model to obtain a feature map; the first calculation module is used for carrying out convolution calculation on the feature map through the target detection model to obtain a thermodynamic diagram; and the second calculation module is used for carrying out convolution calculation on the thermodynamic diagram through the target detection model to obtain the target field and the position information corresponding to the target field.

Optionally, in the ticket recording device provided by the embodiment of the present application, the second identifying unit includes: the cutting module is used for cutting the target image according to the target field and the position information corresponding to the target field to obtain an image fragment corresponding to the target field; and the identification module is used for carrying out character identification on the image fragments corresponding to the target fields through the target identification model to obtain character information corresponding to the target fields.

Optionally, in the bill recording device provided by the embodiment of the present application, the identification module includes: the computing sub-module is used for carrying out convolution computation on the image segments corresponding to the target fields through the convolution layers in the target recognition model to obtain a feature sequence; the processing submodule is used for processing the characteristic sequence through a circulating layer in the target identification model to obtain label distribution corresponding to each characteristic component in the characteristic sequence; and the determining submodule is used for determining the text information corresponding to the target field based on the label distribution corresponding to each characteristic component in the characteristic sequence through a transcription layer in the target recognition model.

Optionally, in the bill recording device provided by the embodiment of the present application, the device further includes: the second acquisition unit is used for carrying out field position identification on the target image through the target detection model, and acquiring an image corresponding to the history bill and real tag information in the image corresponding to the history bill before obtaining the target field and the position information corresponding to the target field, wherein the real tag information is the target field and the position information corresponding to the target field in the image corresponding to the history bill; the construction unit is used for constructing a target training set according to the image corresponding to the history bill and the real label information; the training unit is used for training the initial detection model according to the target training set to obtain a target detection model.

Optionally, in the ticket recording device provided by the embodiment of the present application, the training unit includes: the processing module is used for processing the target training set through the initial detection model to obtain prediction tag information; the determining module is used for obtaining a target loss function according to the predicted tag information and the real tag information, wherein the target loss function at least comprises a root mean square error function and an average absolute error function; and the training module is used for training the initial detection model according to the target loss function to obtain a target detection model.

Optionally, in the bill recording device provided by the embodiment of the present application, the device further includes: the correction unit is used for correcting the text information according to the correction instruction if the correction instruction of the text information corresponding to the target field is detected after the text information corresponding to the target field is recorded into the target bill system, so as to obtain corrected text information; the determining unit is used for determining a model to be iterated according to the corrected text information, wherein the model to be iterated is a target detection model or a target identification model; and the updating unit is used for carrying out iterative updating on the model to be iterated according to the corrected text information.

The bill input device includes a processor and a memory, where the first acquiring unit 301, the first identifying unit 302, the second identifying unit 303, the input unit 304, and the like are stored as program units, and the processor executes the program units stored in the memory to implement corresponding functions.

The processor includes a kernel, and the kernel fetches the corresponding program unit from the memory. The kernel can be provided with one or more than one, and accurate entry of bill information is realized by adjusting kernel parameters.

The memory may include volatile memory, random Access Memory (RAM), and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM), among other forms in computer readable media, the memory including at least one memory chip.

The embodiment of the invention provides a computer readable storage medium, wherein a program is stored on the computer readable storage medium, and the program realizes a bill recording method when being executed by a processor.

The embodiment of the invention provides a processor, which is used for running a program, wherein the program runs to execute a bill input method.

As shown in fig. 4, an embodiment of the present invention provides an electronic device, where the device includes a processor, a memory, and a program stored in the memory and executable on the processor, and when the processor executes the program, the following steps are implemented: acquiring a target image corresponding to a target bill to be input; performing field position identification on the target image through the target detection model to obtain a target field and position information corresponding to the target field; performing character recognition based on the position information and the target image through the target recognition model to obtain character information corresponding to the target field; and inputting the text information corresponding to the target field into a target bill system.

Optionally, performing field location recognition on the target image through the target detection model, where obtaining the target field and location information corresponding to the target field includes: performing up-sampling treatment on the target image through the target detection model to obtain a feature map; carrying out convolution calculation on the feature map through a target detection model to obtain a thermodynamic diagram; and carrying out convolution calculation on the thermodynamic diagram through the target detection model to obtain the target field and the position information corresponding to the target field.

Optionally, performing text recognition based on the location information and the target image through the target recognition model, and obtaining text information corresponding to the target field includes: cutting the target image according to the target field and the position information corresponding to the target field to obtain an image fragment corresponding to the target field; and carrying out character recognition on the image fragments corresponding to the target fields through the target recognition model to obtain character information corresponding to the target fields.

Optionally, performing text recognition on the image segment corresponding to the target field through the target recognition model, and obtaining text information corresponding to the target field includes: carrying out convolution calculation on the image segments corresponding to the target fields through a convolution layer in the target recognition model to obtain a feature sequence; processing the feature sequence through a circulating layer in the target recognition model to obtain label distribution corresponding to each feature component in the feature sequence; and determining the text information corresponding to the target field based on the label distribution corresponding to each feature component in the feature sequence through a transcription layer in the target recognition model.

Optionally, before field location recognition is performed on the target image through the target detection model to obtain the target field and location information corresponding to the target field, the method further includes: acquiring an image corresponding to the history bill and real tag information in the image corresponding to the history bill, wherein the real tag information is a target field in the image corresponding to the history bill and position information corresponding to the target field; constructing a target training set according to the image corresponding to the history bill and the real label information; and training the initial detection model according to the target training set to obtain a target detection model.

Optionally, training the initial detection model according to the target training set, and obtaining the target detection model includes: processing the target training set through an initial detection model to obtain predictive label information; obtaining a target loss function according to the predicted tag information and the real tag information, wherein the target loss function at least comprises a root mean square error function and an average absolute error function; training the initial detection model according to the target loss function to obtain a target detection model.

Optionally, after entering the text information corresponding to the target field into the target bill system, the method further comprises: if a correction instruction of the text information corresponding to the target field is detected, correcting the text information according to the correction instruction to obtain corrected text information; determining a model to be iterated according to the corrected text information, wherein the model to be iterated is a target detection model or a target identification model; and carrying out iterative updating on the model to be iterated according to the corrected text information.

The device herein may be a server, PC, PAD, cell phone, etc.

The application also provides a computer program product adapted to perform, when executed on a data processing device, a program initialized with the method steps of: acquiring a target image corresponding to a target bill to be input; performing field position identification on the target image through the target detection model to obtain a target field and position information corresponding to the target field; performing character recognition based on the position information and the target image through the target recognition model to obtain character information corresponding to the target field; and inputting the text information corresponding to the target field into a target bill system.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, etc., such as Read Only Memory (ROM) or flash RAM. Memory is an example of a computer-readable medium.

Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises an element.

The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and variations of the present application will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the application are to be included in the scope of the claims of the present application.

Claims

1. A method for entering notes, comprising:

acquiring a target image corresponding to a target bill to be input;

Performing field position identification on the target image through a target detection model to obtain a target field and position information corresponding to the target field;

Performing character recognition based on the position information and the target image through a target recognition model to obtain character information corresponding to the target field;

And inputting the text information corresponding to the target field into a target bill system.

2. The method of claim 1, wherein performing field location recognition on the target image by a target detection model to obtain a target field and location information corresponding to the target field comprises:

performing up-sampling processing on the target image through the target detection model to obtain a feature map;

Performing convolution calculation on the feature map through the target detection model to obtain a thermodynamic diagram;

And carrying out convolution calculation on the thermodynamic diagram through the target detection model to obtain the target field and the position information corresponding to the target field.

3. The method of claim 1, wherein performing text recognition based on the location information and the target image by a target recognition model to obtain text information corresponding to the target field comprises:

cutting the target image according to the target field and the position information corresponding to the target field to obtain an image fragment corresponding to the target field;

and carrying out character recognition on the image fragments corresponding to the target fields through the target recognition model to obtain character information corresponding to the target fields.

4. The method of claim 3, wherein performing text recognition on the image segment corresponding to the target field by the target recognition model to obtain text information corresponding to the target field comprises:

Performing convolution calculation on the image segments corresponding to the target fields through a convolution layer in the target recognition model to obtain a feature sequence;

processing the characteristic sequence through a circulating layer in the target recognition model to obtain label distribution corresponding to each characteristic component in the characteristic sequence;

And determining the text information corresponding to the target field based on the label distribution corresponding to each characteristic component in the characteristic sequence through a transcription layer in the target recognition model.

5. The method according to claim 1, wherein before performing field location recognition on the target image by using a target detection model to obtain a target field and location information corresponding to the target field, the method further comprises:

Acquiring an image corresponding to a history bill and real tag information in the image corresponding to the history bill, wherein the real tag information is a target field in the image corresponding to the history bill and position information corresponding to the target field;

Constructing a target training set according to the image corresponding to the history bill and the real label information;

training the initial detection model according to the target training set to obtain the target detection model.

6. The method of claim 5, wherein training an initial detection model in accordance with the target training set to obtain the target detection model comprises:

processing the target training set through the initial detection model to obtain predictive label information;

obtaining a target loss function according to the predicted tag information and the real tag information, wherein the target loss function at least comprises a root mean square error function and an average absolute error function;

And training the initial detection model according to the target loss function to obtain the target detection model.

7. The method of claim 1, wherein after entering the text information corresponding to the target field into a target ticket system, the method further comprises:

if a correction instruction of the text information corresponding to the target field is detected, correcting the text information according to the correction instruction to obtain corrected text information;

Determining a model to be iterated according to the corrected text information, wherein the model to be iterated is the target detection model or the target identification model;

and carrying out iterative updating on the model to be iterated according to the corrected text information.

8. A ticket entry device, comprising:

the first acquisition unit is used for acquiring a target image corresponding to a target bill to be input;

The first identification unit is used for carrying out field position identification on the target image through a target detection model to obtain a target field and position information corresponding to the target field;

the second recognition unit is used for performing character recognition based on the position information and the target image through a target recognition model to obtain character information corresponding to the target field;

And the input unit is used for inputting the text information corresponding to the target field into a target bill system.

9. A computer-readable storage medium, characterized in that the computer-readable storage medium comprises a stored program, wherein the program, when run, controls the storage medium to perform the method of entering notes according to any one of claims 1 to 7 at a device.

10. An electronic device comprising one or more processors and a memory for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of entering notes of any of claims 1 to 7.