CN113011349A

CN113011349A - Element identification method and device of bill and storage medium

Info

Publication number: CN113011349A
Application number: CN202110312790.7A
Authority: CN
Inventors: 张亚; 邓小远; 杨宇喆; 许明
Original assignee: Industrial and Commercial Bank of China Ltd ICBC
Current assignee: Industrial and Commercial Bank of China Ltd ICBC
Priority date: 2021-03-24
Filing date: 2021-03-24
Publication date: 2021-06-22

Abstract

The embodiment of the specification provides a method and a device for identifying elements of a bill and a storage medium, and can be applied to the technical field of artificial intelligence. The method comprises the following steps: analyzing a preset number of bill images to obtain characteristic information of target elements in the bill images; removing the target elements in the bill images to obtain the bill images with the target elements removed; filling simulated target elements into the bill images removed by the target elements according to the characteristic information of the target elements to generate a sample image set; the sample images in the sample image set comprise bill images filled with simulated target elements and position labels of the target elements in the bill images; and training the sample images in the sample image set by using a target detection algorithm to obtain a target element position detection model so as to identify the target elements in the bill images to be identified based on the target element position detection model, thereby achieving the element identification accuracy of the bills.

Description

Element identification method and device of bill and storage medium

Technical Field

The embodiment of the specification relates to the technical field of artificial intelligence, in particular to a method and a device for identifying elements of a bill and a storage medium.

Background

With the development of the internet, electronic office work becomes a trend. When the business personnel transact business process and relate to the transmission and management of bills, paper bills are often converted into bill images for transmission or management. When the related elements of the bill need to be utilized, the related elements of the bill are often extracted in a manual reading mode, and the manual reading mode for extracting the bill elements has the problem of low efficiency. In order to promote scientific and technological enabling services, service core data are intelligently and automatically put into storage, and the OCR text recognition technology is usually adopted to realize automatic recognition of the elements of the bill so as to replace manual entry.

Taking a check as an example, at present, the general solution for domestic check element recognition is as follows: fixed format check: input image → target fragment domain cutting → element to be recognized positioning → element classification of handwriting and printing category → element recognition. Check with unfixed format: input image → target patch region location → element to be recognized location → element classification of handwriting and printing category → element recognition

However, the check recognition scheme with a fixed format has poor model expandability, and for example, when new requirements such as magnetic code recognition and two-dimensional code detection are added, a new model, a new GPU resource, a new service script and even a new solution is drawn up by modifying a historical scheme. The check recognition scheme with unfixed format can recognize the handwritten and printed form elements in a classified manner, although the accuracy of printed form element recognition can be improved. But presents new challenges to the model training sample size, GPU resources, and service time.

Since the daily volume of transactions for overseas check business is relatively small, it is impractical for the business to provide enough real data as a training sample. Furthermore, the foreign checks have the realistic problems of complicated and various formats, great difference between element representation rules and foreign checks, and different currencies, so that the solution of foreign check element recognition is no longer applicable to foreign check scenes.

Disclosure of Invention

The embodiment of the specification aims to provide a method, a device and a storage medium for element identification of a bill, so as to solve the problem that the requirement on training data is high in the prior art and improve the element identification accuracy of the bill.

In order to solve the above problem, an embodiment of the present specification provides a method for identifying an element of a ticket, where the method includes: analyzing a preset number of bill images to obtain characteristic information of target elements in the bill images; removing the target elements in the bill images to obtain the bill images with the target elements removed; filling simulated target elements into the bill images removed by the target elements according to the characteristic information of the target elements to generate a sample image set; the sample images in the sample image set comprise bill images filled with simulated target elements and position labels of the target elements in the bill images; training the sample images in the sample image set by using a target detection algorithm to obtain a target element position detection model; detecting the bill image to be recognized by using the target element position detection model to obtain the position coordinates of the target element in the bill image to be recognized; and identifying the target elements in the bill image to be identified according to the position coordinates, and converting the target elements into structured information to be output.

In order to solve the above problem, an embodiment of the present specification further provides an element identification apparatus for a bill, the apparatus including: the analysis module is used for analyzing a preset number of bill images to obtain the characteristic information of the target elements in the bill images; the removing module is used for removing the target elements in the bill images to obtain the bill images with the target elements removed; the filling module is used for filling simulated target elements into the bill images removed by the target elements according to the characteristic information of the target elements to generate a sample image set; the sample images in the sample image set comprise bill images filled with simulated target elements and position labels of the target elements in the bill images; the training module is used for training the sample images in the sample image set by using a target detection algorithm to obtain a target element position detection model; the detection module is used for detecting the bill image to be recognized by utilizing the target element position detection model to obtain the position coordinates of the target element in the bill image to be recognized; and the identification module is used for identifying the target elements in the bill image to be identified according to the position coordinates and converting the target elements into structured information to be output.

In order to solve the above problem, an embodiment of the present specification further provides an electronic device, including: a memory for storing a computer program; a processor for executing the computer program to implement: analyzing a preset number of bill images to obtain characteristic information of target elements in the bill images; removing the target elements in the bill images to obtain the bill images with the target elements removed; filling simulated target elements into the bill images removed by the target elements according to the characteristic information of the target elements to generate a sample image set; the sample images in the sample image set comprise bill images filled with simulated target elements and position labels of the target elements in the bill images; training the sample images in the sample image set by using a target detection algorithm to obtain a target element position detection model; detecting the bill image to be recognized by using the target element position detection model to obtain the position coordinates of the target element in the bill image to be recognized; and identifying the target elements in the bill image to be identified according to the position coordinates, and converting the target elements into structured information to be output.

To solve the above problem, embodiments of the present specification further provide a computer-readable storage medium having stored thereon computer instructions, which when executed, implement: analyzing a preset number of bill images to obtain characteristic information of target elements in the bill images; removing the target elements in the bill images to obtain the bill images with the target elements removed; filling simulated target elements into the bill images removed by the target elements according to the characteristic information of the target elements to generate a sample image set; the sample images in the sample image set comprise bill images filled with simulated target elements and position labels of the target elements in the bill images; training the sample images in the sample image set by using a target detection algorithm to obtain a target element position detection model; detecting the bill image to be recognized by using the target element position detection model to obtain the position coordinates of the target element in the bill image to be recognized; and identifying the target elements in the bill image to be identified according to the position coordinates, and converting the target elements into structured information to be output.

As can be seen from the technical solutions provided in the embodiments of the present specification, a preset number of bill images can be analyzed to obtain feature information of target elements in the bill images; removing the target elements in the bill images to obtain the bill images with the target elements removed; filling simulated target elements into the bill images removed by the target elements according to the characteristic information of the target elements to generate a sample image set; the sample images in the sample image set comprise bill images filled with simulated target elements and position labels of the target elements in the bill images; training the sample images in the sample image set by using a target detection algorithm to obtain a target element position detection model; detecting the bill image to be recognized by using the target element position detection model to obtain the position coordinates of the target element in the bill image to be recognized; and identifying the target elements in the bill image to be identified according to the position coordinates, and converting the target elements into structured information to be output. According to the method provided by the embodiment of the specification, the sample data is expanded through a data enhancement integration method, the problem that the requirement for real training data is high in the prior art is solved, and the accuracy rate of element identification of the bill is improved.

Drawings

In order to more clearly illustrate the embodiments of the present specification or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the specification, and other drawings can be obtained by those skilled in the art without creative efforts.

FIG. 1 is a schematic diagram of an example of one scenario herein;

FIG. 2 is a flowchart illustrating the steps of enhancing model training data in an exemplary scenario of the present disclosure;

FIG. 3 is a flowchart of service encapsulation steps in an example scenario of the present specification;

FIG. 4 is a flowchart of a method for identifying elements of a ticket according to an embodiment of the present disclosure;

FIG. 5a is a handwritten check image of the present specification;

FIG. 5b is a print form check image of the present specification;

FIG. 6 is a schematic diagram of target elements of an embodiment of the present disclosure;

FIG. 7 is a diagram of a particular example of a target element of an embodiment of the present description;

FIG. 8a is a schematic diagram illustrating the effect of the embodiments of the present disclosure on removing target elements in a ticket image;

FIG. 8b is a schematic diagram illustrating the effect of the embodiments of the present disclosure on removing target elements from a ticket image;

FIG. 9 is a schematic diagram of an example of one scenario herein;

FIG. 10 is a fragmented image of a handwritten target element in an embodiment of the present description;

FIG. 11 is a patch image of a print target element according to an embodiment of the present disclosure;

FIG. 12 is an exemplary plot of background noise in accordance with embodiments of the present description;

FIG. 13 is a flowchart of a process for implementing an enhanced combinatorial algorithm according to an embodiment of the present disclosure;

FIG. 14 is a schematic diagram of a sample image according to an embodiment of the present disclosure;

fig. 15 is a functional structure diagram of an electronic device according to an embodiment of the present disclosure;

fig. 16 is a functional structure diagram of a task scheduling device according to an embodiment of the present disclosure.

Detailed Description

The technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments of the present disclosure, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present specification without any creative effort shall fall within the protection scope of the present specification.

The paper bill is a paper document with relatively fixed display style and definite data item, such as value-added tax invoice, business license, financial bill, certificate, cheque, etc. Generally, the same type of bills have the same display style and data items, the positions of the same data item are basically the same in different bills, and in an informatization system, the output of paper bills is mostly realized by printing or printing a template. Due to the incompatibility of management system and informatization system between organizations and the like, paper bills are widely used as carriers of business certificates and data transmission between organizations. However, since the paper bills are oriented to manual reading, the structured information such as data items of the bills is lacked, and the work of extracting data in a large number of paper bills is necessarily dependent on manual work. Because the manual efficiency is low, a large number of bills cannot be processed quickly, and for a receiver of paper bills, how to quickly structure the bills through an image recognition technology and avoid repeated data entry becomes the key of organization informatization.

The main difference between the identification of paper bills and the traditional document identification is that not only the text on the paper bills needs to be identified, but also the values of each data item of the bills, namely the elements of the bills, need to be identified, so that the paper bills are structured. Most of the existing bill identification methods are to establish a bill template of the type in advance through experience, machine learning and other modes aiming at a specific bill type, wherein the bill template comprises the size of a standard image of a bill and the relative position of each data item in the standard image, and then the bill template is applied to a new bill image to identify a text corresponding to the position of each data item in the image. These methods usually need to use a large number of training samples to perform model training, and in some scenarios, such as a scenario for identifying the elements of overseas checks, it is not practical to make the business provide enough real data as training data because the daily transaction amount of the overseas check business is relatively small, and the overseas checks have realistic problems of complicated and various formats, large difference between the element expression rules and the domestic checks, and different currencies, so that the solution for identifying the elements of the domestic checks is no longer suitable for the overseas check scenario.

Considering that if a small number of bill images can be used as training data, then performing format analysis on the bill to grasp the font style, font size, character interference noise around the target to be recognized, and the position area where the target text may appear; then carrying out data enhancement to generate enough original pictures and fragments of the area to be identified; then carrying out detection model and recognition model training; and finally, the model issuing service is packaged, so that the problem that the requirement on real training data is high in the prior art is hopefully solved, and the element identification accuracy of the bill is improved. Based on this, the embodiment of this specification provides an element method, device and storage medium of bill.

Referring to fig. 1, an example of a scenario in the present specification is presented. Fig. 1 is a flowchart of a method for identifying an element of a ticket in the scenario example. The method comprises the following steps: the method comprises a real data preparation step 1, a model training data enhancement step 2, a model training step 3 and a service encapsulation step 4.

In this scenario example, the recognition of an element of an outbound check is taken as an example.

The real data preparing step 1 includes preparing a preset number of real oversea check images.

The step 2 of enhancing the model training data comprises the steps of carrying out format analysis on the overseas checks, mastering the font style and the font size of recognition elements such as money and currency, character interference noise around a target to be recognized and a position area where a target text possibly appears; data enhancement is then performed to remove the target elements in the ticket image, and then fill simulated target elements in the removed target elements in the ticket image based on performing layout analysis on the overseas check to generate a sufficient number of ticket images as training samples.

Specifically, as shown in fig. 2, the model training data enhancement step 2 may include a real data analysis step 21, a background noise analysis step 22, a data enhancement combination method design step 23, and a generated data amount evaluation step 24. The real data analysis step 21 is used for performing format analysis on the overseas check, and mastering the font style, the font size and the position area where the target text may appear of the identification elements such as the amount, the currency and the like; the background noise analysis 22 is used to analyze character interference noise of the target element, for example, the amount of money has many symbols such as "$", "USD", "S $", "", and "_" as background noise next to the target element. The character noise directly affects the accuracy of the recognition model, so that the background noise which possibly occurs is necessary to be acquired and mastered to the service and is introduced when the data is enhanced; the data enhanced assembly method design 23 is used to fill in simulated target elements in the removed target elements in the ticket image based on format analysis of the overseas check to generate a sufficient number of ticket images as training samples. The generated data quantity evaluation step 24 is used for evaluating the quantity of generated training samples so as to improve the accuracy of model training.

And the model training step 3 comprises training a target element position detection model and a target element recognition model. The target element position detection model is used for detecting the position of a target element in an overseas check, and the target element recognition model recognizes the target element based on the position of the target element in the overseas check.

And the service packaging step 4 comprises packaging and issuing the trained target element position detection model and the trained target element identification model so as to identify the target elements in the bill image to be identified. Specifically, as shown in fig. 3, the service encapsulation step 4 may include the following steps.

Step 41: the service receives downstream incoming images.

Step 42: and sending the image into a target element position detection model to detect the position of the target element.

Step 43: and a judgment step 42 is to output the detection result, and if the detection result returns to be null, the detection result indicates that the position detection of the target element fails.

Step 44: when the step 43 returns to empty, the programming message returns to 1 to the caller, the message content at least includes a state retry _ code and information retry _ msg, and the message information needs to indicate that the detection model fails to detect.

Step 45: when the step 43 returns not to be empty, it indicates that the detection of the position of the target element is successful, and then performs pixel expansion on the information at the position of the target element.

Step 46: and sending the image obtained after the expansion in the step 45 into a target element recognition model to carry out target element recognition.

Step 47: and judging whether the return of the target element recognition model is empty or not, and if the return of the target element recognition model is empty, entering a corresponding processing branch.

And 48: and when the target element recognition model returns to be empty, modifying the returned information content, and marking to show that the target element recognition model fails to recognize.

Step 49: and if the identification of the target element identification model is successful, constructing identification element structured information according to a message format agreed with a downstream calling party.

Step 410: and returning the message to the step 3.

And identifying the amount of the Singapore check right as a verification case. The service provides approximately 1000 real pictures, of which the amount of handwriting is approximately 700 and the amount of print is approximately 300. The sample size of the handwriting form and the sample size of the printing form are 500 and 100 respectively, other pictures are used for testing, and the real business data volume required by the scheme is reduced by 80% on the same scale. The overall accuracy of end-to-end test after service encapsulation is close to 94%, and the service use requirement is met.

The method provided by the scene example overcomes the defects of high requirement on real training data, high GPU resource utilization rate and poor scheme expandability of the existing check recognition scheme, and can prepare a small number of bill images such as 500 handwritten bill images and 100 printed bill images respectively; then, performing format analysis on the scene check, and mastering the font style and the font size of the recognition elements such as the amount, currency and the like, character interference noise around the target to be recognized and a position area where the target text possibly appears; then carrying out data enhancement to generate enough original pictures and fragments of the area to be identified; then carrying out detection model and recognition model training; and finally, encapsulating the model issuing service. The actual measurement proves that the scheme has high recognition rate, good universality, strong expandability and low GPU resource occupancy rate, the robustness of the detection model is enhanced, and particularly the target detection success rate of the business image with poor quality is improved. In the aspect of preparing training model data, a picture to be marked is generated by adopting a data generation method, wherein one part of a training sample of the target element recognition model is automatically generated for data, and the other part of the training sample is from a target element detection model to position and output fragments which are properly expanded. In addition, handwriting and print form elements are mixed to serve as training samples of the data generation model, recognition model training is carried out after the handwriting and print form elements are mixed, the recognition model training is packaged into a recognition model, and the overall recognition accuracy of the model is improved.

Please refer to fig. 4. The embodiment of the specification provides a method for identifying elements of a bill. In the embodiment of the present specification, a subject performing the element recognition method of the ticket may be an electronic device having a logical operation function, and the electronic device may be a server. The server may be an electronic device having a certain arithmetic processing capability. Which may have a network communication unit, a processor, a memory, etc. Of course, the server is not limited to the electronic device having a certain entity, and may be software running in the electronic device. The server may also be a distributed server, which may be a system with multiple processors, memory, network communication modules, etc. operating in coordination. Alternatively, the server may also be a server cluster formed by several servers. The method may include the following steps.

S410: analyzing a preset number of bill images to obtain the characteristic information of the target elements in the bill images.

In some embodiments, the ticket is an overseas check, for example. For example, about 500 handwritten check images and 100 print check images may be prepared. Wherein, the elements of money amount, currency and the like in the handwritten check are handwritten fonts as shown in fig. 5 a; the elements of money amount, currency, etc. in the printed check are printed fonts, such as Hack-Regular, huawenzhongsong, msyh, simsung, surson, weiruan yahei-1, etc., as shown in fig. 5 b.

In some embodiments, the target element may be an element to be identified in the bill, and the target element may specifically be an element such as currency, a magnetic code, and a date. As shown in fig. 6, the main element to be recognized of the overseas check is the right amount, and in addition, the left amount, currency, magnetic code and other elements are also recognized. The characteristic information of the target element includes a font style, a font size, a font color, a text position, a font weight, and the like, as shown in fig. 7. Analyzing the preset number of ticket images may include analyzing a font style, a font size, a font color, a text position, a writing depth, and the like to obtain characteristic information of the target element. The characteristic information of the target elements is obtained, and the target elements can be generated in a simulation mode in the subsequent steps based on the characteristic information, so that sufficient training samples can be generated to train the model, and the accuracy of target element identification is improved.

S420: and removing the target elements in the bill images to obtain the bill images with the target elements removed.

In some embodiments, the target element in at least one document image is removed, resulting in a document image with the target element removed. Of course, in order to improve the accuracy of target element identification, target elements in a plurality of images with different bill types can be selected and removed, and a plurality of bill images with the target elements removed can be obtained.

In some embodiments, the removing the target element from the document image to obtain the document image with the removed target element includes: acquiring a target element in a target area of the bill image; and randomly erasing the target area, or copying other background areas of the bill image to replace the target area to obtain the bill image with the target elements removed. Specifically, the position of the target element in the document image may be determined based on the feature information of the target element obtained in S410, and then the target area of the target element in the document image, such as currency, magnetic code, etc., is obtained, and then the target element in the target area is removed, for example, an arbitrary erasing target object is selected, or other background area of the copy picture is replaced with the target area, and the effect is shown in fig. 8a and 8 b. By removing the target elements in the bill images, the removal efficiency can be improved, and the bill images with the sufficient removed target elements can be obtained.

In one specific scenario example, as shown in FIG. 9. Removing the target elements in the bill image to obtain the bill image with the removed target elements can be realized through step 230 and step 231.

Specifically, step 230: and (5) preparing a template picture. First, a complete real service image is obtained as shown in fig. 5a and 5 b. It is emphasized that to ensure the generalization of the scheme, it is necessary to provide as many different styles of ticket images in the service scene as possible.

Step 231: and removing the target elements. And removing the target elements needing to be identified, such as currency, elements and magnetic codes, for example, selecting any erasing target object or copying other background areas of the picture to replace the target area.

S430: filling simulated target elements into the bill images removed by the target elements according to the characteristic information of the target elements to generate a sample image set; the sample images in the sample image set include the document image filled with the simulated target elements and the position labels of the target elements in the document image.

And filling the simulated target element into the bill image removed from the target element according to the characteristic information of the target element to obtain a sample image set formed by a plurality of sample images. The sample image includes a document image populated with simulated target elements and position tags of the target elements in the document image.

The filling of the simulated target elements into the bill images removed from the target elements according to the characteristic information of the target elements can be realized in various ways. Filling of the simulated target elements is accomplished, for example, by writing in a text image, i.e., writing in the corresponding position of the document image from which the target elements were removed, which is suitable for printing a check. The method can also be applied to printed checks and handwritten checks by pasting text fragments, namely pasting text fragment images at corresponding positions of the bill images removed by the target elements, wherein the text fragment images comprise the target elements generated based on the feature information simulation of the target elements, and the size of the text fragment images is suitable for the areas where the target elements are located in the bill images.

In some embodiments, the filling of the simulated target elements into the ticket images of the target element removal according to the feature information of the target elements is realized by writing words through text images, and the generating of the sample image set comprises: determining simulation parameters according to the characteristic information of the target elements; the simulation parameters represent generation parameters of simulated target elements; filling the simulated target elements into the bill images with the removed target elements according to the simulation parameters to generate a sample image set. By the mode, sufficient print form training data with different styles can be generated, and the accuracy and the generalization of the model obtained by subsequent training are improved.

Specifically, the simulation parameter may be a font style, a font color, a font size, a text position, or the like. Among them, the font style may include Hack-Regular, huawenzhongsong, msyh, simsun, surson, weiruan yahei-1, and the like. For the font color, the print checks are generally black or gray black, while the handwritten checks are more widely colored as orange, blue, etc. Considering that the way text image writing is generally for print, the color of image writing is chosen to be black or gray black. For the font size, considering only the print checks, the font size may be set between 13-25. Regarding the text position, the position of the designed written text in the original image affects the robustness of the model, and especially, the overlay of the overlay on the background noise or beyond a specific area is performed for some target elements. Based on this, different writing position offsets need to be designed for different target elements. If the magnetic code and the left amount of money are concerned, the offset of the width is important, while the offset of the right amount of money and the currency on the height is strict, and the two target elements in the real bill image are frequently marked on a boundary frame or a solid line in the area.

In some embodiments, the filling of the simulated target elements into the note images with the removed target elements according to the feature information of the target elements is realized by means of text fragment pasting, and the generating of the sample image set comprises: generating a plurality of fragment images according to the characteristic information of the target element; the patch image includes a simulated target element; the size of the fragment image is adapted to the area where the target element in the bill image is located; and pasting the fragment image to a preset area in the bill image with the target elements removed to generate a sample image set. Through the mode that the text piece is pasted, can generate the printing form training data of sufficient, can also generate the handwriting training data of sufficient to the model that makes follow-up training obtain has the commonality good, characteristics that scalability is strong.

In some embodiments, since an image obtained by a text fragment pasting method has obvious color jitter at a pasting boundary, a patching operation is required, so that the obtained image is closer to a real bill image, and the quality of a sample picture is improved. Accordingly, the method may further comprise: and repairing the pasting boundary of the fragment image and the note image removed by the target element by adopting an average color mode.

In some embodiments, the patch images may generate a plurality of patch images according to the feature information of the target element, specifically, a plurality of patch images are generated according to the feature information of the target element by using a WGAN algorithm; the simulated target elements comprised by the image are handwritten target elements. Specifically, the target element of handwriting is generated by using WGAN algorithm, which is described as an example, where some parameters include: training sample size: 600(500 handwritten fragments mixed with 100 print fragments); generating an image size (width, height) — (288, 72); the iteration times are 2 ten thousand; model training generates one round of pictures every 1000 times, the number being 512. Because the pictures generated by the model are not labeled, the pictures which can be labeled need to be manually selected from the generated pictures. The generated patch image is shown in fig. 10. The multiple fragment images are generated according to the characteristic information of the target elements by using a WGAN algorithm, so that the target elements of the handwritten bill can be generated, the bill images containing the handwritten target elements are obtained, and data are provided for the training of a subsequent model.

In some embodiments, generating a plurality of fragment images according to the feature information of the target element may also be implemented by writing a text image, and generating a sample image set similarly to the above-described implementation of filling a simulated target element in a ticket image removed from the target element according to the feature information of the target element by writing a text image includes: determining simulation parameters according to the characteristic information of the target elements; the simulation parameters represent generation parameters of simulated target elements; filling simulated target elements into an image template with a preset size according to the simulation parameters to obtain a plurality of fragment images; the simulated target elements comprised by the image are print target elements. Considering that the generated fragment image can be used as a training sample, it is ensured that written words on the fragment image cannot exceed the image boundary, otherwise, the mismatch with the label affects the training process of the model, thereby directly causing the accuracy of the model to be reduced. To solve this problem, any template with a sufficiently large size may be prepared, information such as the text and font to be written may be fixed, and then the image. And finally, reconstructing a fragment graph based on the text size and the set position offset to generate the fragment graph which can be directly used for training the recognition model. The generated patch image is shown in fig. 11.

Furthermore, it is considered that the target elements are close to the character noise, and the type and the quantity of the character noise need to be matched with the actual scene situation. Such as noise asterisks, typically appear in amounts of 1-3 to the left, right, or both ends of the amount to be recognized. Therefore, when constructing a written text, the diversity of character noise needs to be considered in an important way to ensure the generalization of the recognition model.

In some embodiments, the method may further comprise: analyzing the bill images in a preset number to obtain background noise close to the position of the target element; correspondingly, filling a simulated target element in the bill image removed from the target element according to the characteristic information of the target element, and introducing the background noise at the position close to the simulated target element to generate a sample image set. Specifically, the background noise that may appear around the target element, such as money amounts with a plurality of symbols like "$", "USD", "S $", "", and boundary "_", is shown in fig. 12 as the background noise next to the money amount to be recognized. Character noise directly affects the accuracy of target element recognition. The background character noise close to the target is regarded as the target to be trained together in the sample image, and the detection success rate of the model can be improved. Meanwhile, for the recognition model, the character noise is regarded as blank background, and only enough background noise samples are prepared in the stage of training the model, so that the recognition success rate of the target elements can be improved according to the process.

In some embodiments, the method may further comprise: carrying out effect enhancement processing on the bill image filled with the simulated target elements; the effect enhancement processing includes at least one of pixel missing, handwriting reduction blurring, solid line local disturbance, salt and pepper noise and the like. Specifically, the bill image filled with the simulated target element may have the influence of pixel missing, handwriting fading and blurring, solid line local interference, salt and pepper noise and the like, and considering that the model processing logic in the service package is the input of the target element identification model and the output of the target element detection model, the detection success rate of the target element detection model for the position of the target element directly influences the accuracy rate of the target element identification model for the target element identification. By further performing effect enhancement on the bill image filled with the simulated target elements, the robustness of the target element detection model is enhanced, and particularly the detection success rate of the target element positions of the business images with poor quality is improved. In one example scenario, the effect enhancement method used may include: pixel missing: the element text portion trajectory, such as a black body, is overlaid with other background colors, or filled in white. Solid line local interference: lines are drawn at random positions in the target element area, the colors of the lines are generally mainly black, and the lengths, thicknesses and directions of the lines are designed in a simulation mode through business images. Salt and pepper noise: salt and pepper noise is added, and the noise ratio set by the scene is 0.04 and 0.03. The writing is lightened and blurred: gaussian blur is selected, and the blur intensity radius set for the scene is [0.3,0.4,0.5,0.6 ].

In a specific scenario example, as shown in fig. 9, filling the simulated target element in the ticket image with the removed target element according to the feature information of the target element, and generating the sample image set may be implemented by

steps

232, 231, 233, and 234.

Specifically, in step 232, the filling of the simulated target elements into the ticket image with the removed target elements according to the feature information of the target elements may be realized in a text image writing manner and a text fragment pasting manner, and meanwhile, after the simulated target elements are filled, the background noise may be introduced into the generated image.

In step 233, effect enhancement post-processing may be performed on the image generated in step 232 to reduce the effects of pixel dropout, blur due to handwriting reduction, local interference with solid lines, salt and pepper noise, and the like.

In step 234, the process of the above steps can be implemented by designing an enhanced combinatorial algorithm. Specifically, as shown in fig. 13, implementing the data enhancement function mentioned in step 232 to step 233 by programming may include the following steps.

Step 234-0: the python image processing library is imported.

In the present scenario example, the data enhancement base is based on the PIL base, and mainly involves modules such as Image, ImageDraw, ImageFont, and the like.

Step 234-1: and reading the original image with the target elements removed.

The original image with the target elements removed is a bill image with the target elements removed. If the picture is a single picture, the picture is directly sent to an image.open () function to obtain image data, and if the picture is a plurality of pictures, namely an image list, an algorithm is required to randomly extract a certain picture and send the picture to the image.open () function. The algorithm can be implemented autonomously or can be constructed by means of random modules.

Step 234-2: whether the fragments are stuck.

The image processing action branches, passing in command parameter confirmation according to step 234-1. And confirming whether the simulated target elements are filled in the bill images with the removed target elements by means of pasting the text fragments or writing the text images.

Step 234-3: a list of shards.

When the method of patch attachment is selected to fill the sheet image from which the target element is removed with the simulated target element, a patch image to be attached to the original target area needs to form a list.

Step 234-4: and (4) randomly extracting by an algorithm.

As in step 234-1, the extraction of a certain fragment image is realized, and the probability of extracting a specific fragment can be set according to the proportion of the image category in the service scene.

Step 234-5: and (5) image synthesis.

Paste composition is performed on the fragments and the original image by means of an image () function, wherein the paste position is selected from a preset area.

Step 234-6: and (5) repairing the boundary.

Because the synthesized picture has obvious color jitter at the boundary, the picture needs to be repaired, and the scheme adopts the average color of fragments to repair the boundary.

Step 234-7: and initializing parameters.

Under the condition that the simulated target elements are filled in the bill images with the removed target elements without selecting the fragment pasting mode, the text image writing is required to be realized by self. Firstly, parameter initialization is needed, such as font style, font position, font size and the like, range dereferencing is carried out according to actual conditions, and then discretization is carried out to form a list type.

Step 234-8: the algorithm parameters are randomly combined.

Step 234-7 is to program the list formed by each parameter to realize random extraction and form parameter combinations.

Step 234-9: and (4) loading a function.

The parameter combinations formed in step 234-8 are fed into functions such as imagefont.

Step 234-10: and executing the function.

Text image writing implementation.

Step 234-11: the effect is enhanced.

And programming automatically realizes functions such as pixel missing, interference line drawing and the like, and randomly calls a certain effect enhancement function to realize effect enhancement.

Step 234-12: and (5) image saving.

A sample image set is generated, wherein the sample images in the sample image set comprise the bill images filled with the simulated target elements and the position labels of the target elements in the bill images. The generated sample image is shown in fig. 14.

In the process of generating the sample image set, the method may further include a step of evaluating the number of sample images in the sample image set, generally, the data volume is in direct proportion to the model accuracy rate, that is, the model accuracy rate is relatively higher when the data volume is larger, and table 1 shows a relationship between the data volume and the model accuracy rate determined in the overseas check scene in the present scheme.

TABLE 1

S440: and training the sample images in the sample image set by using a target detection algorithm to obtain a target element position detection model so as to identify the target elements in the bill images to be identified based on the target element position detection model.

Object Detection, also called Object extraction, is an image segmentation based on Object geometry and statistical features. In this embodiment, the sample images in the sample image set may be trained by using algorithms such as centeret and fsaf, so as to find the corresponding relationship between the positions of the sample images and the target elements, and obtain the target element position detection model.

After the target element position detection model is obtained, the to-be-recognized bill image can be input into the target element position detection model, the position coordinates of the target element in the to-be-recognized bill image are obtained based on the output result of the model, and then the target element in the to-be-recognized bill image is recognized according to the position coordinates.

In some embodiments, the identifying the target element in the bill image to be identified based on the target element position detection model includes: detecting the bill image to be recognized by using the target element position detection model to obtain the position coordinates of the target element in the bill image to be recognized; and identifying the target elements in the bill image to be identified according to the position coordinates, and converting the target elements into structured information to be output.

Specifically, the to-be-recognized bill image may be input into the target element position detection model, and the output of the target element position detection model is used as a positioning result of the target element in the to-be-recognized bill image, so as to obtain the position coordinates of the target element in the to-be-recognized bill image. After the position coordinates of the target elements in the bill image to be recognized are obtained, the image in the position coordinate area can be recognized, and the recognition result is used as the recognition result of the target elements in the bill image to be recognized. The identification of the target element in the bill image to be identified utilizes the positioning result of the target element position detection model on the target element in the bill image to be identified, so that the position of the target element in the bill image to be identified can be accurately determined, and the final identification result can be improved.

In some embodiments, the recognition may be performed by a preset target element recognition model. The step of identifying the target element in the bill image to be identified according to the position coordinate and converting the target element into structured information to be output comprises the following steps: acquiring a target image under the position coordinate in the bill image to be identified; performing pixel expansion on the target image; inputting the expanded target image into a preset target element recognition model to obtain a recognition result of the target element in the bill image to be recognized; and converting the recognition result into structured information and outputting the structured information. Specifically, a target image under the position coordinates in the bill image to be recognized can be obtained, the target image comprises a target element, the target image can be clearer by performing pixel expansion on the target image, and the recognition accuracy of a target element recognition model is improved.

In some embodiments, the target element recognition model is trained according to the following: inputting the sample images in the sample image set into the target element position detection model to obtain the position coordinates of the target elements in the sample images; performing character marking on the image under the position coordinate, and performing character marking on the fragment image generated according to the characteristic information of the target element; and training a visual attention model by taking the image under the position coordinate subjected to character marking and the fragment image as training data to obtain the target element recognition model. Specifically, the data amount of the training sample of the target element recognition model is shown in table 1, the source of the training data includes the fragment image and the image in the position coordinate obtained based on the position coordinate of the target element in the sample image output by the target element position detection model, the images are subjected to character labeling to generate training data, and then the visual Attention recognition model (CNN + LSTM + Attention) is trained by using the training data to find out the association relationship between the pixels of the image in the training data and the characters, so as to obtain the target element recognition model. Furthermore, the image under the position coordinate and the fragment image can be subjected to pixel expansion, and the image under the position coordinate and the fragment image after the pixel expansion are used as training data, so that the identification accuracy of the target element identification model is improved. According to the training mode, the training data comprise handwriting training data and print style training data, the fragment images and the images obtained by outputting the target element position detection model are included, so that the training data have diversity, the expandability and the universality of the target element recognition model are further improved, and GPU resource occupation can be reduced in the training process.

In some embodiments, to further improve the training efficiency of the target element recognition model, the ratio of the image at the position coordinate marked with the text to the fragment image as training data may be 4: and 1, generalization of the target element recognition model obtained after training.

The method provided by the embodiment of the specification can analyze a preset number of bill images to obtain the characteristic information of the target elements in the bill images; removing the target elements in the bill images to obtain the bill images with the target elements removed; filling simulated target elements into the bill images removed by the target elements according to the characteristic information of the target elements to generate a sample image set; the sample images in the sample image set comprise bill images filled with simulated target elements and position labels of the target elements in the bill images; training the sample images in the sample image set by using a target detection algorithm to obtain a target element position detection model; detecting the bill image to be recognized by using the target element position detection model to obtain the position coordinates of the target element in the bill image to be recognized; and identifying the target elements in the bill image to be identified according to the position coordinates, and converting the target elements into structured information to be output. According to the method provided by the embodiment of the specification, the sample data is expanded through a data enhancement integration method, the problem that the requirement for real training data is high in the prior art is solved, and the accuracy rate of element identification of the bill is improved.

Fig. 15 is a functional structure diagram of an electronic device according to an embodiment of the present disclosure, where the electronic device may include a memory and a processor.

In some embodiments, the memory may be used to store the computer program and/or module, and the processor may implement various functions of the element identification method of the ticket by executing or executing the computer program and/or module stored in the memory and calling data stored in the memory. The memory can mainly comprise a program storage area and a data storage area, wherein the program storage area can store an operating system and an application program required by at least one function; the storage data area may store data created according to the use of the user terminal. In addition, the memory may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.

The Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an APPlication Specific Integrated Circuit (ASIC), a Field-Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The processor may execute the computer instructions to perform the steps of: analyzing a preset number of bill images to obtain characteristic information of target elements in the bill images; removing the target elements in the bill images to obtain the bill images with the target elements removed; filling simulated target elements into the bill images removed by the target elements according to the characteristic information of the target elements to generate a sample image set; the sample images in the sample image set comprise bill images filled with simulated target elements and position labels of the target elements in the bill images; and training the sample images in the sample image set by using a target detection algorithm to obtain a target element position detection model so as to identify the target elements in the bill images to be identified based on the target element position detection model.

In the embodiments of the present description, the functions and effects specifically realized by the electronic device may be explained in comparison with other embodiments, and are not described herein again.

Fig. 16 is a functional configuration diagram of a device for identifying an element of a bill according to an embodiment of the present disclosure, and the device may specifically include the following structural modules.

The analysis module 1610 is configured to analyze a preset number of bill images to obtain feature information of target elements in the bill images;

a removing module 1620, configured to remove the target element in the ticket image, so as to obtain a ticket image with the target element removed;

a filling module 1630, configured to fill the simulated target element in the ticket image with the removed target element according to the feature information of the target element, so as to generate a sample image set; the sample images in the sample image set comprise bill images filled with simulated target elements and position labels of the target elements in the bill images;

the identification module 1640 is configured to train the sample images in the sample image set by using a target detection algorithm to obtain a target element position detection model, so as to identify the target elements in the to-be-identified bill image based on the target element position detection model.

The embodiment of the present specification further provides a computer-readable storage medium of a task scheduling method, where the computer-readable storage medium stores computer program instructions, and when the computer program instructions are executed, the computer-readable storage medium implements: analyzing a preset number of bill images to obtain characteristic information of target elements in the bill images; removing the target elements in the bill images to obtain the bill images with the target elements removed; filling simulated target elements into the bill images removed by the target elements according to the characteristic information of the target elements to generate a sample image set; the sample images in the sample image set comprise bill images filled with simulated target elements and position labels of the target elements in the bill images; and training the sample images in the sample image set by using a target detection algorithm to obtain a target element position detection model so as to identify the target elements in the bill images to be identified based on the target element position detection model.

In the embodiments of the present specification, the storage medium includes, but is not limited to, a Random Access Memory (RAM), a Read-Only Memory (ROM), a Cache (Cache), a Hard Disk Drive (HDD), or a Memory Card (Memory Card). The memory may be used for storing the computer programs and/or modules, and the memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to the use of the user terminal, and the like. In addition, the memory may include high speed random access memory, and may also include non-volatile memory. In the embodiments of the present description, the functions and effects specifically realized by the program instructions stored in the computer-readable storage medium may be explained in contrast to other embodiments, and are not described herein again.

The method, the device and the storage medium for identifying the element of the bill provided by the embodiment of the present disclosure can be applied to the technical field of artificial intelligence. Of course, the present invention can be applied to the financial field or any fields other than the financial field, and the application fields of the method, the apparatus, and the storage medium for identifying the components of the document are not limited in the embodiments of the present specification.

It should be noted that, in the present specification, each embodiment is described in a progressive manner, and the same or similar parts in each embodiment may be referred to each other, and each embodiment focuses on differences from other embodiments. In particular, as for the apparatus embodiment and the apparatus embodiment, since they are substantially similar to the method embodiment, the description is relatively simple, and reference may be made to some descriptions of the method embodiment for relevant points.

After reading this specification, persons skilled in the art will appreciate that any combination of some or all of the embodiments set forth herein, without inventive faculty, is within the scope of the disclosure and protection of this specification.

In the 90 s of the 20 th century, improvements in a technology could clearly distinguish between improvements in hardware (e.g., improvements in circuit structures such as diodes, transistors, switches, etc.) and improvements in software (improvements in process flow). However, as technology advances, many of today's process flow improvements have been seen as direct improvements in hardware circuit architecture. Designers almost always obtain the corresponding hardware circuit structure by programming an improved method flow into the hardware circuit. Thus, it cannot be said that an improvement in the process flow cannot be realized by hardware physical modules. For example, a Programmable Logic Device (PLD), such as a Field Programmable Gate Array (FPGA), is an integrated circuit whose Logic functions are determined by programming the Device by a user. A digital system is "integrated" on a PLD by the designer's own programming without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Furthermore, nowadays, instead of manually making an Integrated Circuit chip, such Programming is often implemented by "logic compiler" software, which is similar to a software compiler used in program development and writing, but the original code before compiling is also written by a specific Programming Language, which is called Hardware Description Language (HDL), and HDL is not only one but many, such as abel (advanced Boolean Expression Language), ahdl (alternate Language Description Language), traffic, pl (core unified Programming Language), HDCal, JHDL (Java Hardware Description Language), langue, Lola, HDL, laspam, hardbyscript Description Language (vhr Description Language), and vhjhd (Hardware Description Language), which is currently used by most popular version-software. It will also be apparent to those skilled in the art that hardware circuitry that implements the logical method flows can be readily obtained by merely slightly programming the method flows into an integrated circuit using the hardware description languages described above.

The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. One typical implementation device is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smartphone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

From the above description of the embodiments, it is clear to those skilled in the art that the present specification can be implemented by software plus a necessary general hardware platform. Based on such understanding, the technical solutions of the present specification may be essentially or partially implemented in the form of software products, which may be stored in a storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and include instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments of the present specification.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

The description is operational with numerous general purpose or special purpose computing system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet-type devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.

This description may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The specification may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

While the specification has been described with examples, those skilled in the art will appreciate that there are numerous variations and permutations of the specification that do not depart from the spirit of the specification, and it is intended that the appended claims include such variations and modifications that do not depart from the spirit of the specification.

Claims

1. A method for identifying an element of a document, the method comprising:

analyzing a preset number of bill images to obtain characteristic information of target elements in the bill images;

removing the target elements in the bill images to obtain the bill images with the target elements removed;

filling simulated target elements into the bill images removed by the target elements according to the characteristic information of the target elements to generate a sample image set; the sample images in the sample image set comprise bill images filled with simulated target elements and position labels of the target elements in the bill images;

and training the sample images in the sample image set by using a target detection algorithm to obtain a target element position detection model so as to identify the target elements in the bill images to be identified based on the target element position detection model.

2. The method of claim 1, wherein the target elements include at least one of currency, magnetic code, date; the characteristic information of the target element comprises at least one of font style, font size, font color, text position and writing depth.

3. The method of claim 1, wherein the removing the target elements from the document image to obtain the document image with the removed target elements comprises:

acquiring a target element in a target area of the bill image;

and randomly erasing the target area, or copying other background areas of the bill image to replace the target area to obtain the bill image with the target elements removed.

4. The method of claim 1, wherein the step of filling the bill image with the removed target elements according to the feature information of the target elements into a simulated target element, and generating the sample image set comprises:

determining simulation parameters according to the characteristic information of the target elements; the simulation parameters represent generation parameters of simulated target elements;

filling the simulated target elements into the bill images with the removed target elements according to the simulation parameters to generate a sample image set.

5. The method of claim 1, wherein the step of filling the bill image with the removed target elements according to the feature information of the target elements into a simulated target element, and generating the sample image set comprises:

generating a plurality of fragment images according to the characteristic information of the target element; the patch image includes a simulated target element; the size of the fragment image is adapted to the area where the target element in the bill image is located;

and pasting the fragment image to a preset area in the bill image with the target elements removed to generate a sample image set.

6. The method of claim 5, further comprising: and repairing the pasting boundary of the fragment image and the note image removed by the target element by adopting an average color mode.

7. The method of claim 5, wherein the generating a plurality of patch images according to the feature information of the target element comprises: generating a plurality of fragment images according to the characteristic information of the target element by using a WGAN algorithm; the simulated target elements comprised by the image are handwritten target elements.

8. The method of claim 5, wherein the generating a plurality of patch images according to the feature information of the target element comprises:

filling simulated target elements into an image template with a preset size according to the simulation parameters to obtain a plurality of fragment images; the simulated target elements comprised by the image are print target elements.

9. The method of claim 1, further comprising:

analyzing the bill images in a preset number to obtain background noise close to the position of the target element;

correspondingly, filling a simulated target element in the bill image removed from the target element according to the characteristic information of the target element, and introducing the background noise at the position close to the simulated target element to generate a sample image set.

10. The method of claim 1, further comprising: carrying out effect enhancement processing on the bill image filled with the simulated target elements; the effect enhancement processing includes at least one of pixel missing, handwriting reduction blurring, solid line local disturbance, salt and pepper noise and the like.

11. The method according to claim 1, wherein the identifying the target element in the bill image to be identified based on the target element position detection model comprises:

detecting the bill image to be recognized by using the target element position detection model to obtain the position coordinates of the target element in the bill image to be recognized;

and identifying the target elements in the bill image to be identified according to the position coordinates, and converting the target elements into structured information to be output.

12. The method according to claim 1, wherein the identifying the target element in the bill image to be identified according to the position coordinate, and the converting the target element into a structured information output comprises:

acquiring a target image under the position coordinate in the bill image to be identified;

performing pixel expansion on the target image;

inputting the expanded target image into a preset target element recognition model to obtain a recognition result of the target element in the bill image to be recognized;

and converting the recognition result into structured information and outputting the structured information.

13. The method of claim 12, wherein the target element recognition model is trained according to:

inputting the sample images in the sample image set into the target element position detection model to obtain the position coordinates of the target elements in the sample images;

performing character marking on the image under the position coordinate, and performing character marking on the fragment image generated according to the characteristic information of the target element;

and training a visual attention model by taking the image under the position coordinate subjected to character marking and the fragment image as training data to obtain the target element recognition model.

14. The method of claim 13, wherein the ratio of the text-labeled image at the position coordinate to the fragment image as training data is 4: 1.

15. an apparatus for recognizing an element of a bill, the apparatus comprising:

the analysis module is used for analyzing a preset number of bill images to obtain the characteristic information of the target elements in the bill images;

the removing module is used for removing the target elements in the bill images to obtain the bill images with the target elements removed;

the filling module is used for filling simulated target elements into the bill images removed by the target elements according to the characteristic information of the target elements to generate a sample image set; the sample images in the sample image set comprise bill images filled with simulated target elements and position labels of the target elements in the bill images;

and the identification module is used for training the sample images in the sample image set by using a target detection algorithm to obtain a target element position detection model so as to identify the target elements in the bill images to be identified based on the target element position detection model.

16. An electronic device, comprising:

a memory for storing a computer program;

a processor for executing the computer program to implement: analyzing a preset number of bill images to obtain characteristic information of target elements in the bill images; removing the target elements in the bill images to obtain the bill images with the target elements removed; filling simulated target elements into the bill images removed by the target elements according to the characteristic information of the target elements to generate a sample image set; the sample images in the sample image set comprise bill images filled with simulated target elements and position labels of the target elements in the bill images; and training the sample images in the sample image set by using a target detection algorithm to obtain a target element position detection model so as to identify the target elements in the bill images to be identified based on the target element position detection model.

17. A computer readable storage medium having computer instructions stored thereon that when executed perform: analyzing a preset number of bill images to obtain characteristic information of target elements in the bill images; removing the target elements in the bill images to obtain the bill images with the target elements removed; filling simulated target elements into the bill images removed by the target elements according to the characteristic information of the target elements to generate a sample image set; the sample images in the sample image set comprise bill images filled with simulated target elements and position labels of the target elements in the bill images; and training the sample images in the sample image set by using a target detection algorithm to obtain a target element position detection model so as to identify the target elements in the bill images to be identified based on the target element position detection model.