CN112733726A - Bill sample capacity expansion method and device, electronic equipment and storage medium - Google Patents

Bill sample capacity expansion method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN112733726A
CN112733726A CN202110036737.9A CN202110036737A CN112733726A CN 112733726 A CN112733726 A CN 112733726A CN 202110036737 A CN202110036737 A CN 202110036737A CN 112733726 A CN112733726 A CN 112733726A
Authority
CN
China
Prior art keywords
bill
picture
pictures
blank
bill picture
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110036737.9A
Other languages
Chinese (zh)
Inventor
陈录城
王庆刚
王忠诚
盛国军
沈圣远
徐鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Haier Digital Technology Qingdao Co Ltd
Haier Caos IoT Ecological Technology Co Ltd
Qingdao Haier Industrial Intelligence Research Institute Co Ltd
Original Assignee
Haier Digital Technology Qingdao Co Ltd
Haier Caos IoT Ecological Technology Co Ltd
Qingdao Haier Industrial Intelligence Research Institute Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Haier Digital Technology Qingdao Co Ltd, Haier Caos IoT Ecological Technology Co Ltd, Qingdao Haier Industrial Intelligence Research Institute Co Ltd filed Critical Haier Digital Technology Qingdao Co Ltd
Priority to CN202110036737.9A priority Critical patent/CN112733726A/en
Publication of CN112733726A publication Critical patent/CN112733726A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/60Editing figures and text; Combining figures or text
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/04Context-preserving transformations, e.g. by using an importance map

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Editing Of Facsimile Originals (AREA)
  • Image Processing (AREA)

Abstract

The embodiment of the invention relates to a method and a device for expanding a bill sample, electronic equipment and a storage medium, wherein the method comprises the following steps: carrying out data enhancement processing on an original bill picture to obtain a plurality of bill pictures, and converging the obtained plurality of bill pictures and the original bill picture together to serve as a bill picture set; respectively carrying out character erasing processing on the bill pictures in the bill picture set to obtain a blank bill picture set; respectively carrying out form line detection on the bill pictures in the bill picture set to obtain each form line area of each bill picture; the method comprises the steps of collecting any one blank bill picture in a blank bill picture set, sequentially inserting bill content items into each table line area of the bill picture corresponding to the blank bill picture to obtain at least one synthesized bill picture, collecting the obtained synthesized bill picture and the bill picture set together to serve as a bill sample set, obtaining a large number of bill samples close to reality by expanding the capacity of original bill samples, and reducing the workload of sample expansion.

Description

Bill sample capacity expansion method and device, electronic equipment and storage medium
Technical Field
The embodiment of the invention relates to the technical field of machine learning, in particular to a method and a device for expanding a bill sample, electronic equipment and a storage medium.
Background
Value-added taxes belong to private data, which are mostly undesirable to be provided by customers, so that less sample data is generally collected. When the value-added tax invoice document is analyzed, because the data samples are less, the sample characteristic quantity, the characteristic dimension and the characteristic semantic information are less, when the model is trained, overfitting or under-fitting is easy to occur when the model is trained, the generalization capability of the trained model is low, and the model is difficult to apply in an industrial production environment.
In order to solve the above problems, an existing scheme is to use a data enhancement technology to perform operations on an original picture through various image transformations, such as random cropping, filtering increase, brightness change, picture splicing, and the like, so as to increase sample richness. However, when the data size is small, only the feature richness is increased by a data enhancement method, but the feature dimension is not changed, for example, a certain scene or a certain type of picture is not included in the original data set, and no semantic information which does not appear is included no matter how the image enhancement is performed.
Disclosure of Invention
In view of this, embodiments of the present invention provide a method and an apparatus for expanding a volume of a bill sample, an electronic device, and a storage medium, so as to obtain a large number of bill samples close to reality through expansion.
Additional features and advantages of embodiments of the invention will be set forth in the detailed description which follows, or in part will be obvious from the description, or may be learned by practice of embodiments of the invention.
In a first aspect of the present disclosure, an embodiment of the present invention provides a method for expanding a volume of a bill sample, including:
carrying out data enhancement processing on an original bill picture to obtain a plurality of bill pictures, and converging the obtained plurality of bill pictures and the original bill picture together to serve as a bill picture set;
respectively carrying out character erasing processing on the bill pictures in the bill picture set to obtain a blank bill picture set;
respectively carrying out form line detection on the bill pictures in the bill picture set to obtain each form line area of each bill picture;
and for any blank bill picture in the blank bill picture set, sequentially inserting bill content items into each table line area of the bill picture corresponding to the blank bill picture to obtain at least one synthesized bill picture, and gathering the obtained synthesized bill picture and the bill picture set together to serve as a bill sample set.
In an embodiment, for any one of the blank bill pictures in the set of blank bill pictures, sequentially inserting bill content items into each table line area of the bill picture corresponding to the blank bill picture to obtain at least one synthesized bill picture includes:
randomly generating a set of bill contents which accord with rules for each table line area, wherein the bill contents comprise a plurality of bill content items which respectively correspond to each table line area;
and for any blank bill picture in the blank bill picture set, sequentially inserting bill content items of the generated bill contents into each table line area of the bill picture corresponding to the blank bill picture to obtain at least one synthesized bill picture.
In one embodiment, the randomly generating a set of ticket contents meeting the rules for each table line region further comprises: and randomly setting a format and/or a style for the bill content.
In one embodiment, randomly formatting and/or styling the ticket content includes: carrying out format statistics on the bill contents of a plurality of original bill pictures to determine the occurrence probability of each format, and randomly setting a format for the bill contents according to the occurrence probability of each format; and/or carrying out pattern statistics on the bill contents of the original bill pictures to determine the occurrence probability of each pattern, and randomly setting a pattern for the bill contents according to the occurrence probability of each pattern.
In one embodiment, the format includes at least one of font, color, and text size; the style includes at least one of a number of words, a content of the words, and a sequence of the words.
In an embodiment, after obtaining the bill sample set, the method further includes sampling the bill sample set, performing data enhancement processing on a sampling result, and aggregating a bill picture obtained through the data enhancement processing and the bill sample set.
In one embodiment, the data enhancement process includes at least one of: text distortion, center word magnification, rotation, cropping, adding black borders, brightness variation, and sharpness variation.
In a second aspect of the present disclosure, an embodiment of the present invention further provides a device for expanding a bill sample, including:
the enhancement processing unit is used for carrying out data enhancement processing on the original bill pictures to obtain a plurality of bill pictures and converging the obtained plurality of bill pictures and the original bill pictures together to form a bill picture set;
the character erasing unit is used for respectively erasing characters of the bill pictures in the bill picture set to obtain a blank bill picture set;
the form area detection unit is used for respectively carrying out form line detection on the bill pictures in the bill picture set to obtain each form line area of each bill picture;
and the picture synthesis unit is used for sequentially inserting bill content items into each table line area of the bill pictures corresponding to the blank bill pictures to obtain at least one synthesized bill picture, and converging the obtained synthesized bill pictures and the bill picture set together to serve as a bill sample set.
In one embodiment, the picture composition unit includes a ticket content generation subunit and a ticket content insertion subunit:
the bill content generating subunit is used for randomly generating a set of bill contents which accord with rules for each table line area, and each bill content comprises a plurality of bill content items which respectively correspond to each table line area;
the bill content insertion subunit is used for sequentially inserting the bill content items of the generated bill contents into each table line area of the bill pictures corresponding to the blank bill pictures to obtain at least one synthesized bill picture for any blank bill picture in the blank bill picture set.
In an embodiment, the picture composition unit further includes a bill content style setting subunit, where the bill content style setting subunit is configured to randomly set a format and/or a style for the bill content after randomly generating a set of bill content meeting a rule for each table line region.
In an embodiment, the ticket content style setting subunit is further configured to: carrying out format statistics on the bill contents of a plurality of original bill pictures to determine the occurrence probability of each format, and randomly setting a format for the bill contents according to the occurrence probability of each format; and/or carrying out pattern statistics on the bill contents of the original bill pictures to determine the occurrence probability of each pattern, and randomly setting a pattern for the bill contents according to the occurrence probability of each pattern.
In one embodiment, the format includes at least one of font, color, and text size; the style includes at least one of a number of words, a content of the words, and a sequence of the words.
In an embodiment, the apparatus further includes a secondary enhancement processing unit, configured to, after the ticket sample set is obtained, sample the ticket sample set, perform data enhancement processing on a sampling result, and aggregate a ticket picture obtained through the data enhancement processing and the ticket sample set.
In one embodiment, the data enhancement process includes at least one of: text distortion, center word magnification, rotation, cropping, adding black borders, brightness variation, and sharpness variation.
In a third aspect of the disclosure, an electronic device is provided. The electronic device includes: a processor; and a memory for storing executable instructions that, when executed by the processor, cause the electronic device to perform the method of the first aspect.
In a fourth aspect of the disclosure, a computer-readable storage medium is provided, on which a computer program is stored, which computer program, when being executed by a processor, carries out the method in the first aspect.
The technical scheme provided by the embodiment of the invention has the beneficial technical effects that:
the method comprises the steps of obtaining a plurality of bill pictures by performing data enhancement processing on original bill pictures, converging the obtained plurality of bill pictures and the original bill pictures together to serve as a bill picture set, respectively performing character erasing processing on the bill pictures in the bill picture set to obtain a blank bill picture set, and respectively performing form line detection on the bill pictures in the bill picture set to obtain each form line area of each bill picture; and sequentially inserting bill content items into each table line area of the bill pictures corresponding to the blank bill pictures to obtain at least one synthesized bill picture, converging the obtained synthesized bill pictures and the bill picture set to form a bill sample set, expanding the capacity of an original bill sample to obtain a large number of bill samples close to reality, enabling the newly generated bill and the real or enhanced bill to have the same background, and enabling the text content in the newly generated bill to approach to real data.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments of the present invention will be briefly described below, and it is obvious that the drawings in the following description are only a part of the embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the contents of the embodiments of the present invention and the drawings without creative efforts.
Fig. 1 is a schematic flowchart of a method for expanding a volume of a bill sample according to an embodiment of the present invention;
fig. 2 is a schematic flow chart of another method for expanding a volume of a bill sample according to an embodiment of the present invention;
FIG. 3 is a schematic structural diagram of a capacity expansion device for a bill sample according to an embodiment of the present invention;
FIG. 4 is a schematic structural diagram of another apparatus for expanding a volume of a bill sample according to an embodiment of the present invention;
FIG. 5 shows a schematic diagram of an electronic device suitable for use in implementing embodiments of the present invention.
Detailed Description
In order to make the technical problems solved, the technical solutions adopted and the technical effects achieved by the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be described in further detail below with reference to the accompanying drawings, and it is obvious that the described embodiments are only some embodiments, but not all embodiments, of the embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, belong to the scope of protection of the embodiments of the present invention.
It should be noted that the terms "system" and "network" are often used interchangeably herein in embodiments of the present invention. Reference to "and/or" in embodiments of the invention is intended to include any and all combinations of one or more of the associated listed items. The terms "first", "second", and the like in the description and claims of the present disclosure and in the drawings are used for distinguishing between different objects and not for limiting a particular order.
It should be further noted that, in the embodiments of the present invention, each of the following embodiments may be executed alone, or may be executed in combination with each other, and the embodiments of the present invention are not limited in this respect.
The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.
The technical solutions of the embodiments of the present invention are further described by the following detailed description with reference to the accompanying drawings.
Fig. 1 is a flowchart illustrating a method for expanding a ticket sample according to an embodiment of the present invention, where this embodiment is applicable to a case where a plurality of ticket samples are obtained according to a ticket picture, and the method may be executed by an expansion device of a ticket sample configured in an electronic device, as shown in fig. 1, the method for expanding a ticket sample according to this embodiment includes:
in step S110, data enhancement processing is performed on the original bill picture to obtain a plurality of bill pictures, and the obtained plurality of bill pictures and the original bill picture are gathered together to be a bill picture set.
The data enhancement processing can be performed on the original bill picture by adopting any one or more methods, such as character distortion, central character amplification, rotation, cutting, black edge addition, brightness change, definition change and the like.
In step S120, the text erasing processing is performed on the bill pictures in the bill picture set respectively to obtain a blank bill picture set.
For example, the characters in the bill picture, or the characters and the table lines can be erased by a character erasing algorithm, so as to obtain a blank bill picture without the characters.
In step S130, form line detection is performed on the form pictures in the form picture set to obtain each form line area of each form picture.
The form line area refers to a form area for filling characters in the bill.
In step S140, for any blank bill picture in the blank bill picture set, sequentially inserting a bill content item into each table line region of the bill picture corresponding to the blank bill picture to obtain at least one synthesized bill picture, and aggregating the obtained synthesized bill picture and the bill picture set together to serve as a bill sample set.
For example, a set of ticket contents meeting the rules may be randomly generated for each table line area, where the ticket contents include a plurality of ticket content items corresponding to each table line area, and for any blank ticket picture in the blank ticket picture set, the ticket content items of the generated ticket contents are sequentially inserted into each table line area of the ticket picture corresponding to the blank ticket picture to obtain at least one composite ticket picture.
Further, after a set of bill contents meeting the rules are randomly generated for each table line area, the format and/or the style of the bill contents can be randomly set.
For example, format statistics is performed on the bill contents of a plurality of original bill pictures to determine the occurrence probability of each format, and a format is randomly set for the bill contents according to the occurrence probability of each format. For another example, the form statistics may be performed on the form contents of the plurality of original form pictures to determine the occurrence probability of each form, and the form is randomly set for the form contents according to the occurrence probability of each form.
The format comprises one or more of font, color, character size and the like, and the style comprises one or more of character number, character content, character sequence and the like.
According to one or more embodiments of the present disclosure, after the bill sample set is obtained, the bill sample set may be further sampled, and the sampling result is further subjected to data enhancement processing, such as character distortion, central character enlargement, rotation, clipping, black edge addition, brightness change, and sharpness change, so as to converge the bill picture obtained by the data enhancement processing and the bill sample set.
In the embodiment, a plurality of bill pictures are obtained by performing data enhancement processing on original bill pictures, the obtained plurality of bill pictures and the original bill pictures are gathered together to be used as a bill picture set, character erasing processing is performed on the bill pictures in the bill picture set respectively to obtain a blank bill picture set, and form line detection is performed on the bill pictures in the bill picture set respectively to obtain each form line area of each bill picture; and sequentially inserting bill content items into each table line area of the bill pictures corresponding to the blank bill pictures to obtain at least one synthesized bill picture, converging the obtained synthesized bill pictures and the bill picture set to form a bill sample set, expanding the sample volume to obtain a large number of bill samples close to reality, enabling the newly generated bill and the real or enhanced bill to have the same background, and enabling the text content in the newly generated bill to approach to real data.
Fig. 2 is a schematic flow chart illustrating another method for expanding a volume of a bill sample according to an embodiment of the present invention, where the embodiment is based on the foregoing embodiment and is implemented with improvement and optimization. As shown in fig. 2, the method for expanding the volume of the bill sample according to this embodiment includes:
in step S210, an original ticket image is input, and the original ticket image is subjected to data enhancement processing. The enhancement treatment method comprises the following steps: text distortion, center word magnification, rotation, cropping, adding black borders, brightness variation, sharpness variation, and the like.
In step S220, the text (foreground) is erased by a text erasure algorithm.
In step S230, the non-text area, i.e. the blank background, is obtained after erasing, so as to prepare for generating the specific text in the non-text area.
In step S240, each area of the value added tax is obtained through the table line detection algorithm.
In step S250, each table line area of each bill picture is obtained, and each area is marked.
In step S260, the blank background is combined with the table area to obtain a complete blank background map.
In step S270, randomly setting a format and/or a style for the ticket content, including:
in step S271, a font is set for the ticket content. Such as song style, black body, sony-imitating style, etc., weight setting is performed according to the appearance frequency of various font formats of sample data.
In step S272, a color is set for the ticket content. The weight is set according to the color use frequency of the sample font, such as light blue with the highest use frequency.
In step S273, font settings of different sizes are made for the ticket content. For example, the font size combination of Aa, AA and Aa has larger weight for AA and Aa types and smaller weight for Aa.
In step S274, the number of characters is set for the content of the ticket. And setting according to the text length of each area. And obtaining the long and short texts of the corresponding region.
In step S275, the text content is set for the ticket content. In a designated area, such as the amount column, there are only arabic numerals, decimal points, gamma, $, upper case amount, parentheses, meta, angle, point, whole, and circle, and other characters should not appear in the area.
In step S276, a text order is set for the ticket contents. In a specific area, the characters need to be generated according to a certain rule, such as the amount column, $, which must be matched with the numbers and in front of the numbers.
In step S280, data synthesis is performed. For steps S271 to S276, sample synthesis may be performed according to a principle of random combination, i.e. a target sample is obtained, and the obtained synthesized sample is very close to real data.
In step S290, data enhancement is performed: sampling the synthesized sample, performing secondary data enhancement, and mixing the residual sample with the sample subjected to secondary data enhancement to serve as output.
The technical scheme of the embodiment can enable the generated bill picture and the real bill to have the same background, and the text content in the bill is close to the real data and is closer to the real bill.
As an implementation of the methods shown in the above drawings, the present application provides an embodiment of a capacity expansion device for a bill sample, and fig. 3 illustrates a schematic structural diagram of the capacity expansion device for a bill sample provided in this embodiment, where the embodiment of the capacity expansion device corresponds to the embodiment of the methods shown in fig. 1 and fig. 2, and the capacity expansion device may be specifically applied to various electronic devices. As shown in fig. 3, the capacity expansion device for a bill sample according to this embodiment includes an enhancement processing unit 310, a text erasing unit 320, a table area detecting unit 330, and a picture synthesizing unit 340.
The enhancement processing unit 310 is configured to perform data enhancement processing on an original bill picture to obtain a plurality of bill pictures, and aggregate the obtained plurality of bill pictures and the original bill picture together to serve as a bill picture set;
the character erasing unit 320 is configured to erase the characters of the bill pictures in the bill picture set to obtain a blank bill picture set;
the table area detecting unit 330 is configured to perform table line detection on the bill pictures in the bill picture set respectively to obtain each table line area of each bill picture;
the picture synthesizing unit 340 is configured to, for any blank bill picture in the blank bill picture set, sequentially insert a bill content item in each table line area of the bill picture corresponding to the blank bill picture to obtain at least one synthesized bill picture, and aggregate the obtained synthesized bill picture and the bill picture set together to serve as a bill sample set.
According to one or more embodiments of the present disclosure, the apparatus further includes a secondary enhancement processing unit (not shown in fig. 3), and the secondary enhancement processing unit is configured to, after obtaining the ticket sample set, further perform sampling on the ticket sample set, perform data enhancement processing on a sampling result, and aggregate a ticket picture obtained by the data enhancement processing and the ticket sample set.
According to one or more embodiments of the present disclosure, the data enhancement process includes at least one of: text distortion, center word magnification, rotation, cropping, adding black borders, brightness variation, and sharpness variation.
The capacity expansion device for the bill sample provided by the embodiment of the present disclosure can execute the capacity expansion method for the bill sample provided by the embodiment of the present disclosure, and has the corresponding functional modules and beneficial effects of the execution method.
Fig. 4 is a schematic structural diagram of another expansion device for a bill sample according to an embodiment of the present invention, and as shown in fig. 4, the expansion device for a bill sample according to this embodiment includes: an enhancement processing unit 410, a text erasure unit 420, a table area detection unit 430, a picture composition unit 440, and a quadratic enhancement processing unit 450.
The enhancement processing unit 410 is configured to perform data enhancement processing on an original bill picture to obtain a plurality of bill pictures, and aggregate the obtained plurality of bill pictures and the original bill picture together to form a bill picture set.
The text erasing unit 420 is configured to perform text erasing processing on the bill pictures in the bill picture set respectively to obtain a blank bill picture set.
The table area detecting unit 430 is configured to perform table line detection on the bill pictures in the bill picture set to obtain each table line area of each bill picture.
The picture composition unit 440 includes a ticket content generation sub-unit 441 and a ticket content insertion sub-unit 442. The ticket content generating subunit 441 is configured to randomly generate a set of ticket contents meeting rules for each table line area, where the ticket contents include a plurality of ticket content items corresponding to each table line area respectively; the bill content insertion subunit 442 is configured to, for any one of the blank bill pictures in the set of blank bill pictures, sequentially insert the bill content item of the generated bill content in each table line area of the bill picture corresponding to the blank bill picture to obtain at least one composite bill picture.
The secondary enhancement processing unit 450 is configured to, after obtaining the bill sample set, further perform sampling on the bill sample set, perform data enhancement processing on the sampling result, and aggregate the bill picture obtained by the data enhancement processing and the bill sample set.
According to one or more embodiments of the present disclosure, the picture composition unit 440 is configured to further include a ticket content style setting subunit (not shown in fig. 4) configured to randomly set a format and/or a style for the ticket content after randomly generating a set of ticket content meeting the rules for each table line region.
According to one or more embodiments of the present disclosure, the ticket content style setting subunit is configured to further perform format statistics on the ticket contents of the plurality of original ticket pictures to determine an occurrence probability of each format, and randomly set a format for the ticket contents according to the occurrence probability of each format; and/or carrying out pattern statistics on the bill contents of the original bill pictures to determine the occurrence probability of each pattern, and randomly setting a pattern for the bill contents according to the occurrence probability of each pattern.
According to one or more embodiments of the present disclosure, the format includes at least one of a font, a color, and a text size; the style includes at least one of a number of words, a content of the words, and a sequence of the words.
According to one or more embodiments of the present disclosure, the data enhancement process includes at least one of: text distortion, center word magnification, rotation, cropping, adding black borders, brightness variation, and sharpness variation.
The capacity expansion device for the bill sample provided by the embodiment of the present disclosure can execute the capacity expansion method for the bill sample provided by the embodiment of the present disclosure, and has the corresponding functional modules and beneficial effects of the execution method.
Referring now to FIG. 5, a block diagram of an electronic device 500 suitable for use in implementing embodiments of the present invention is shown. The terminal device in the embodiment of the present invention is, for example, a mobile device, a computer, or a vehicle-mounted device built in a floating car, or any combination thereof. In some embodiments, the mobile device may include, for example, a cell phone, a smart home device, a wearable device, a smart mobile device, a virtual reality device, and the like, or any combination thereof. The electronic device shown in fig. 5 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.
As shown in fig. 5, electronic device 500 may include a processing means (e.g., central processing unit, graphics processor, etc.) 501 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)502 or a program loaded from a storage means 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data necessary for the operation of the electronic apparatus 500 are also stored. The processing device 501, the ROM 502, and the RAM 503 are connected to each other through a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.
Generally, the following devices may be connected to the I/O interface 505: input devices 506 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 507 including, for example, a Liquid Crystal Display (LCD), speakers, vibrators, and the like; storage devices 508 including, for example, magnetic tape, hard disk, etc.; and a communication device 509. The communication means 509 may allow the electronic device 500 to communicate with other devices wirelessly or by wire to exchange data. While fig. 5 illustrates an electronic device 500 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.
In particular, according to an embodiment of the present invention, the processes described above with reference to the flowcharts may be implemented as a computer software program. For example, embodiments of the invention include a computer program product comprising a computer program embodied on a computer-readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 509, or installed from the storage means 508, or installed from the ROM 502. The computer program performs the above-described functions defined in the method of the embodiment of the present invention when executed by the processing apparatus 501.
It should be noted that the computer readable medium mentioned above can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In embodiments of the invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In yet another embodiment of the invention, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.
The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.
The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: carrying out data enhancement processing on an original bill picture to obtain a plurality of bill pictures, and converging the obtained plurality of bill pictures and the original bill picture together to serve as a bill picture set; respectively carrying out character erasing processing on the bill pictures in the bill picture set to obtain a blank bill picture set; respectively carrying out form line detection on the bill pictures in the bill picture set to obtain each form line area of each bill picture; and for any blank bill picture in the blank bill picture set, sequentially inserting bill content items into each table line area of the bill picture corresponding to the blank bill picture to obtain at least one synthesized bill picture, and gathering the obtained synthesized bill picture and the bill picture set together to serve as a bill sample set.
Computer program code for carrying out operations for embodiments of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present invention may be implemented by software or hardware. Where the name of a unit does not in some cases constitute a limitation of the unit itself, for example, the first retrieving unit may also be described as a "unit for retrieving at least two internet protocol addresses".
The foregoing description is only a preferred embodiment of the invention and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure in the embodiments of the present invention is not limited to the specific combinations of the above-described features, but also encompasses other embodiments in which any combination of the above-described features or their equivalents is possible without departing from the spirit of the disclosure. For example, the above features and (but not limited to) the features with similar functions disclosed in the embodiments of the present invention are mutually replaced to form the technical solution.

Claims (10)

1. A method for expanding a bill sample is characterized by comprising the following steps:
carrying out data enhancement processing on an original bill picture to obtain a plurality of bill pictures, and converging the obtained plurality of bill pictures and the original bill picture together to serve as a bill picture set;
respectively carrying out character erasing processing on the bill pictures in the bill picture set to obtain a blank bill picture set;
respectively carrying out form line detection on the bill pictures in the bill picture set to obtain each form line area of each bill picture;
and for any blank bill picture in the blank bill picture set, sequentially inserting bill content items into each table line area of the bill picture corresponding to the blank bill picture to obtain at least one synthesized bill picture, and gathering the obtained synthesized bill picture and the bill picture set together to serve as a bill sample set.
2. The method of claim 1, wherein for any one of the blank bill pictures in the set of blank bill pictures, sequentially inserting a bill content item into each table line area of the corresponding bill picture of the blank bill picture to obtain at least one composite bill picture comprises:
randomly generating a set of bill contents which accord with rules for each table line area, wherein the bill contents comprise a plurality of bill content items which respectively correspond to each table line area;
and for any blank bill picture in the blank bill picture set, sequentially inserting bill content items of the generated bill contents into each table line area of the bill picture corresponding to the blank bill picture to obtain at least one synthesized bill picture.
3. The method of claim 2, further comprising, after randomly generating a set of ticket contents meeting the rules for each table line region:
and randomly setting a format and/or a style for the bill content.
4. The method of claim 3, wherein randomly formatting and/or styling the ticket content comprises:
carrying out format statistics on the bill contents of a plurality of original bill pictures to determine the occurrence probability of each format, and randomly setting a format for the bill contents according to the occurrence probability of each format; and/or
And carrying out pattern statistics on the bill contents of the original bill pictures to determine the occurrence probability of each pattern, and randomly setting a pattern for the bill contents according to the occurrence probability of each pattern.
5. The method of claim 3, wherein the format comprises at least one of font, color, and text size;
the style includes at least one of a number of words, a content of the words, and a sequence of the words.
6. The method of claim 1, further comprising, after obtaining the ticket sample set, sampling the ticket sample set, performing data enhancement processing on the sampling result, and aggregating a ticket picture obtained by the data enhancement processing with the ticket sample set.
7. The method of claim 1 or 6, wherein the data enhancement process comprises at least one of:
text distortion, center word magnification, rotation, cropping, adding black borders, brightness variation, and sharpness variation.
8. A flash chamber of bill sample, comprising:
the enhancement processing unit is used for carrying out data enhancement processing on the original bill pictures to obtain a plurality of bill pictures and converging the obtained plurality of bill pictures and the original bill pictures together to form a bill picture set;
the character erasing unit is used for respectively erasing characters of the bill pictures in the bill picture set to obtain a blank bill picture set;
the form area detection unit is used for respectively carrying out form line detection on the bill pictures in the bill picture set to obtain each form line area of each bill picture;
and the picture synthesis unit is used for sequentially inserting bill content items into each table line area of the bill pictures corresponding to the blank bill pictures to obtain at least one synthesized bill picture, and converging the obtained synthesized bill pictures and the bill picture set together to serve as a bill sample set.
9. An electronic device, comprising:
a processor; and
a memory to store executable instructions that, when executed by the one or more processors, cause the electronic device to perform the method of any of claims 1-7.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-7.
CN202110036737.9A 2021-01-12 2021-01-12 Bill sample capacity expansion method and device, electronic equipment and storage medium Pending CN112733726A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110036737.9A CN112733726A (en) 2021-01-12 2021-01-12 Bill sample capacity expansion method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110036737.9A CN112733726A (en) 2021-01-12 2021-01-12 Bill sample capacity expansion method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN112733726A true CN112733726A (en) 2021-04-30

Family

ID=75590488

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110036737.9A Pending CN112733726A (en) 2021-01-12 2021-01-12 Bill sample capacity expansion method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112733726A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0588074A2 (en) * 1992-08-18 1994-03-23 Eastman Kodak Company Method and apparatus for character recognition with supervised training
US20140066767A1 (en) * 2012-08-31 2014-03-06 Clearview Diagnostics, Inc. System and method for noise reduction and signal enhancement of coherent imaging systems
CN110490193A (en) * 2019-07-24 2019-11-22 西安网算数据科技有限公司 Single Text RegionDetection method and ticket contents recognition methods
CN110619312A (en) * 2019-09-20 2019-12-27 百度在线网络技术(北京)有限公司 Method, device and equipment for enhancing positioning element data and storage medium
CN111414906A (en) * 2020-03-05 2020-07-14 北京交通大学 Data synthesis and text recognition method for paper bill picture

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0588074A2 (en) * 1992-08-18 1994-03-23 Eastman Kodak Company Method and apparatus for character recognition with supervised training
US20140066767A1 (en) * 2012-08-31 2014-03-06 Clearview Diagnostics, Inc. System and method for noise reduction and signal enhancement of coherent imaging systems
CN110490193A (en) * 2019-07-24 2019-11-22 西安网算数据科技有限公司 Single Text RegionDetection method and ticket contents recognition methods
CN110619312A (en) * 2019-09-20 2019-12-27 百度在线网络技术(北京)有限公司 Method, device and equipment for enhancing positioning element data and storage medium
CN111414906A (en) * 2020-03-05 2020-07-14 北京交通大学 Data synthesis and text recognition method for paper bill picture

Similar Documents

Publication Publication Date Title
CN111381909B (en) Page display method and device, terminal equipment and storage medium
WO2021259061A1 (en) Document translation method and apparatus, storage medium, and electronic device
CN109669617B (en) Method and device for switching pages
CN109472852B (en) Point cloud image display method and device, equipment and storage medium
EP3896584A1 (en) Document input content processing method and apparatus, electronic device, and storage medium
CN111325704B (en) Image restoration method and device, electronic equipment and computer-readable storage medium
CN109815448B (en) Slide generation method and device
CN110069191B (en) Terminal-based image dragging deformation implementation method and device
CN110766772A (en) Flatter-based cross-platform poster manufacturing method, device and equipment
CN110874172B (en) Method, device, medium and electronic equipment for amplifying APP interface
CN111459364A (en) Icon updating method and device and electronic equipment
CN114528816B (en) Collaborative editing information display method and device, electronic equipment and readable medium
CN113238688A (en) Table display method, device, equipment and medium
CN112492399B (en) Information display method and device and electronic equipment
CN111626922A (en) Picture generation method and device, electronic equipment and computer readable storage medium
CN110688116A (en) Image file analysis method, device, equipment and readable medium
CN112733726A (en) Bill sample capacity expansion method and device, electronic equipment and storage medium
CN113296771A (en) Page display method, device, equipment and computer readable medium
CN112445478A (en) Graphic file processing method, device, equipment and medium
CN114416945A (en) Word cloud picture display method, device, equipment and medium
CN112269957A (en) Picture processing method, device, equipment and storage medium
CN115209215A (en) Video processing method, device and equipment
CN112015416A (en) Verification method and device for developing webpage, electronic equipment and computer readable medium
CN110288685B (en) Gear mode data display method and device based on svg shade function
CN111026983B (en) Method, device, medium and electronic equipment for realizing hyperlink

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210430

RJ01 Rejection of invention patent application after publication