CN112199330A - Mixed document filing method, filing device and storage medium - Google Patents

Mixed document filing method, filing device and storage medium Download PDF

Info

Publication number
CN112199330A
CN112199330A CN202011055808.1A CN202011055808A CN112199330A CN 112199330 A CN112199330 A CN 112199330A CN 202011055808 A CN202011055808 A CN 202011055808A CN 112199330 A CN112199330 A CN 112199330A
Authority
CN
China
Prior art keywords
document
folder
mixed
data page
documents
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011055808.1A
Other languages
Chinese (zh)
Inventor
彭健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shaoguan Power Supply Bureau Guangdong Power Grid Co Ltd
Original Assignee
Shaoguan Power Supply Bureau Guangdong Power Grid Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shaoguan Power Supply Bureau Guangdong Power Grid Co Ltd filed Critical Shaoguan Power Supply Bureau Guangdong Power Grid Co Ltd
Priority to CN202011055808.1A priority Critical patent/CN112199330A/en
Publication of CN112199330A publication Critical patent/CN112199330A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/11File system administration, e.g. details of archiving or snapshots
    • G06F16/113Details of archiving
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting

Abstract

The embodiment of the invention discloses a mixed document filing method, a filing device and a storage medium. The mixed document filing method comprises the following steps: acquiring a mixed document scanned by a printer, wherein the mixed document comprises N documents, each document comprises at least one data page, and N is more than or equal to 2; carrying out document separation on the mixed document to obtain N folders, wherein each folder stores all data pages of one document; and sequentially identifying the type of the document stored in each folder, and storing the folder into an archive directory folder corresponding to the type of the document stored in the folder. The method can solve the problems of much manual participation, complex operation, low working efficiency and high labor cost in the prior art, and realizes the effects of automatically segmenting, identifying types and archiving the mixed document.

Description

Mixed document filing method, filing device and storage medium
Technical Field
The embodiment of the invention relates to a document automatic classification technology, in particular to a mixed document filing method, a filing device and a storage medium.
Background
With the establishment of enterprise-level systems, information items of local and municipal offices mainly include information maintenance and information repair items, and electronic archiving and classified storage management of such documents, especially monthly period paper settlement documents, is becoming an important work.
At present, the electronic filing operation process of paper documents generally includes that firstly, the filed paper documents are automatically scanned one by one page by one, after the scanning is finished, the whole scanned part is manually and mechanically divided by using a division software, and finally, the divided documents are stored in a pre-established electronic catalog to be used as main accessories of the electronic filing or system process.
However, the whole process of the electronic archiving operation process of the paper document requires manual participation, which results in complex operation, low working efficiency and high cost of human resources.
Disclosure of Invention
The invention provides a mixed document filing method, a filing device and a storage medium, which are used for realizing automatic segmentation, type identification and filing of a mixed document.
In a first aspect, an embodiment of the present invention provides a hybrid document archiving method, where the hybrid document archiving method includes:
acquiring a mixed document scanned by a printer, wherein the mixed document comprises N documents, each document comprises at least one data page, and N is more than or equal to 2;
carrying out document separation on the mixed document to obtain N folders, wherein each folder stores all data pages of a document;
and sequentially identifying the type of the document stored in each folder, and storing the folder into an archive directory folder corresponding to the type of the document stored in the folder.
Optionally, the N documents are arranged in sequence, and a first mark is provided on a first data page of each document.
Optionally, the performing document separation on the mixed document to obtain N folders includes:
calling an image recognition interface API (application program interface) through a Python script to recognize the mixed document and obtain all data pages provided with the first marks;
and putting data pages from a first data page to a previous page of a second data page into a folder, wherein the first data page and the second data page are two adjacent data pages provided with the first mark, and the first data page is positioned before the second data page.
Optionally, the N documents are arranged in a disordered manner or in a sequential manner, and all data pages of each document are provided with a second mark and a third mark; the second mark is used to indicate to which document the data page belongs, and the third mark is used to indicate the position of the data page in the document.
Optionally, the performing document separation on the mixed document to obtain N folders includes:
calling an image recognition interface API through a Python script to recognize the mixed document, putting the data pages with the same second mark into a folder, and sequencing the data pages in the folder according to the third mark.
Optionally, the performing document separation on the mixed document to obtain N folders includes:
learning the historical mixed document to obtain a training model;
and carrying out document separation on the mixed document according to the training model to obtain N folders.
Optionally, the sequentially identifying the type of the document stored in each folder includes:
the type of the document stored in each folder is identified in turn based on the elements of all the data pages of a document stored in each folder.
Optionally, the elements include: at least one of a title element, a contract name element, and a settlement month element.
In a second aspect, an embodiment of the present invention further provides a device for filing a hybrid document, where the hybrid document is filed
The gear device comprises:
the mixed document acquisition module is used for acquiring a mixed document scanned by a printer, the mixed document comprises N parts of documents, each part of document comprises at least one data page, and N is more than or equal to 2;
the document separation module is used for carrying out document separation on the mixed document to obtain N folders, and each folder stores all data pages of one document;
and the document identification and storage module is used for sequentially identifying the type of the document stored in each folder and storing the folder into an archive directory folder corresponding to the type of the document stored in the folder.
In a third aspect, an embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the hybrid document archiving method according to the first aspect.
The invention provides a mixed document filing method, which comprises the following steps: acquiring a mixed document scanned by a printer, wherein the mixed document comprises N documents, each document comprises at least one data page, and N is more than or equal to 2; carrying out document separation on the mixed document to obtain N folders, wherein each folder stores all data pages of one document; and sequentially identifying the type of the document stored in each folder, and storing the folder into an archive directory folder corresponding to the type of the document stored in the folder. The method can solve the problems of much manual participation, complex operation, low working efficiency and high labor cost in the prior art, and realizes the effects of automatically segmenting, identifying types and archiving the mixed document.
Drawings
FIG. 1 is a flowchart of a hybrid document archiving method according to a first embodiment of the present invention;
FIG. 2 is a flowchart of a hybrid document archiving method according to a second embodiment of the present invention;
FIG. 3 is a flowchart of a hybrid document archiving method according to a third embodiment of the present invention;
FIG. 4 is a flowchart of a hybrid document archiving method according to a fourth embodiment of the present invention;
FIG. 5 is a flowchart of a hybrid document archiving method according to a fifth embodiment of the present invention;
fig. 6 is a schematic structural diagram of a hybrid document filing apparatus according to a sixth embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
Example one
Fig. 1 is a flowchart of a hybrid document archiving method according to an embodiment of the present invention, and referring to fig. 1, the embodiment is applicable to an implementation process of the hybrid document archiving method, and the method may be executed by a hybrid document archiving apparatus, and specifically includes the following steps:
step 100, acquiring a mixed document scanned by a printer, wherein the mixed document comprises N documents, each document comprises at least one data page, and N is more than or equal to 2.
The mixed document scanned by the printer may be an electronic document, such as a PDF document, an electronic version picture, and the like. Before acquiring the mixed document scanned by the printer, the method further comprises the following steps: the paper files of various documents to be filed can be automatically scanned by a printer by a human to generate the electronic version of the mixed document.
The mixed document includes N documents, where the N documents may be N documents of the same type, N documents of different types, multiple documents of multiple types, and the like, and are not limited herein.
Step 200, performing document separation on the mixed document to obtain N folders, wherein each folder stores all data pages of a document.
Each document in the mixed document is separated, each separated document is independently stored in one folder, and N folders are obtained, and all data pages of one document are stored in each folder.
And 300, sequentially identifying the type of the document stored in each folder, and storing the folder into an archive directory folder corresponding to the type of the document stored in the folder.
The corresponding archive directory folder is established according to the document type, and can be used for storing the documents stored in the identified folder of the corresponding document type.
The working principle of the mixed document filing method is as follows: firstly, paper files of various documents to be filed can be automatically scanned by a printer to generate an electronic mixed document by manpower, the electronic mixed document scanned by the printer is obtained, the mixed document comprises N documents, each document comprises at least one data page, and N is more than or equal to 2; then, carrying out document separation on the mixed document to obtain N folders, wherein each folder stores all data pages of a document; and finally, sequentially identifying the type of the document stored in each folder, and storing the folder into an archive directory folder corresponding to the type of the document stored in the folder. Therefore, each document in the mixed document can be accurately separated, the type of each document can be identified, and the documents are stored and archived according to the types after the type of each document is identified.
The technical solution of the present embodiment provides a method for filing a hybrid document, where the method for filing a hybrid document includes: acquiring a mixed document scanned by a printer, wherein the mixed document comprises N documents, each document comprises at least one data page, and N is more than or equal to 2; carrying out document separation on the mixed document to obtain N folders, wherein each folder stores all data pages of one document; and sequentially identifying the type of the document stored in each folder, and storing the folder into an archive directory folder corresponding to the type of the document stored in the folder. Therefore, each document in the mixed document can be accurately separated, the type of each document can be identified, and the documents are stored and archived according to the types after the type of each document is identified. The method can solve the problems of much manual participation, complex operation, low working efficiency and high labor cost in the prior art, and realizes the effects of automatically segmenting, identifying types and archiving the mixed document.
Example two
Fig. 2 is a flowchart of a hybrid document filing method provided in the second embodiment of the present invention, based on the above embodiment, optionally, N documents are arranged in sequence, and the first data page of each document is provided with a first mark.
The sequence of each data page in each document in the mixed document scanned by the printer is not disordered, and the sequence of different documents is not disordered, so that the N documents in the mixed document are arranged in sequence. Thus, the first mark can be set to the first page data of each document for separating the mixed document in a later process. Wherein, the first mark can be obtained by stamping, marking pattern, marking letter symbol, etc. Referring to fig. 2, the hybrid document filing method includes the steps of:
step 100, acquiring a mixed document scanned by a printer, wherein the mixed document comprises N documents, each document comprises at least one data page, and N is more than or equal to 2.
And 110, calling an image recognition interface API through a Python script to recognize the mixed document, and acquiring all data pages with the first marks.
The mixed document can be identified by calling image identification interfaces API provided by various open source websites through Python scripts so as to identify all data pages provided with the first marks.
Step 120, putting the data pages from the first data page to the previous page of the second data page into a folder, wherein the first data page and the second data page are two adjacent data pages provided with the first mark, and the first data page is positioned in front of the second data page, so that the mixed document can be subjected to document separation to obtain N folders.
The first data page may be a top page of a first document of the N documents in the sequence, the second data page may be a top page of a second document of the N documents in the sequence, and so on, and may further include a third data page and a fourth data page …, the third data page may be a top page of a third document of the N documents in the sequence, the fourth data page may be a top page of a fourth document of the N documents in the sequence, and the … nth data page may be a top page of an nth document of the N documents in the sequence. The first data page and the second data page are two adjacent data pages provided with first marks, and the first data page is positioned before the second data page. Similarly, the second data page and the third data page are two adjacent data pages provided with the first marks, the second data page is positioned before the third data page, the third data page and the fourth data page are two adjacent data pages provided with the first marks, the third data page is positioned before the fourth data page, …, the N-1 data page and the Nth data page are two adjacent data pages provided with the first marks, and the N-1 data page is positioned before the Nth data page. Therefore, the first page of each document can be identified, and the mixed document can be subjected to document separation to obtain N folders.
And 300, sequentially identifying the type of the document stored in each folder, and storing the folder into an archive directory folder corresponding to the type of the document stored in the folder.
In the technical solution of this embodiment, the working principle of the mixed document filing method is as follows: firstly, paper files of various documents to be filed can be automatically scanned by a printer to generate an electronic mixed document by manpower, the electronic mixed document scanned by the printer is obtained, the mixed document comprises N documents, each document comprises at least one data page, and N is more than or equal to 2; then calling an image recognition interface API through a Python script to recognize the mixed document, acquiring all data pages with first marks, and putting the data pages from the first data page to a previous page of a second data page into a folder, wherein the first data page and the second data page are two adjacent data pages with the first marks and the first data page is positioned in front of the second data page, so that the mixed document can be subjected to document separation to obtain N folders; and finally, sequentially identifying the type of the document stored in each folder, and storing the folder into an archive directory folder corresponding to the type of the document stored in the folder. Therefore, each document in the mixed document can be accurately separated, the type of each document can be identified, and the documents are stored and archived according to the types after the type of each document is identified.
It should be noted that, in the technical solution of this embodiment, the first mark of the first page of each document may be the same mark symbol or different mark symbols. When the same mark is marked, each document can be identified only by identifying the mark because the N documents are arranged in sequence, so that the separation speed can be improved when the same mark is marked, and the overall efficiency of the mixed document filing is improved.
EXAMPLE III
Fig. 3 is a flowchart of a hybrid document archiving method provided in the third embodiment of the present invention. On the basis of the above embodiment, optionally, N documents are arranged in a disordered manner or in a sequential manner, and all data pages of each document are provided with a second mark and a third mark; the second mark is used to indicate to which document the data page belongs and the third mark is used to indicate the position of the data page in the document.
The sequence of the data pages in each document in the mixed document scanned by the printer may be chaotic or sequential, and the sequence between different documents may be sequential or chaotic, so that N documents in the mixed document may be chaotic or sequential. All data pages of each document are thus provided with a second mark for indicating to which document the data page belongs in particular and a third mark for indicating the position of the data page in the document.
Optionally, referring to fig. 3, the specific steps of the hybrid document archiving method are as follows:
step 100, acquiring a mixed document scanned by a printer, wherein the mixed document comprises N documents, each document comprises at least one data page, and N is more than or equal to 2.
Step 210, calling an image recognition interface API through a Python script to recognize the mixed document, putting the data pages with the same second mark into a folder, and sequencing the data pages in the folder according to the third mark, so that the mixed document can be subjected to document separation to obtain N folders.
Wherein the second flag is used to indicate to which document the data page belongs specifically, the third flag is used to indicate the position of the data page in the document, for example, the second flag may be a1, a2, A3 … An, wherein a1 may indicate the first document, a2 may indicate the second document, …, An may indicate the nth document, the third flag may be a1, a2 … ap, b1, b2, … bm, …, t1, t2, … tn, wherein a1, a2 … ap may indicate the position of each data page in the first document, wherein each data page in the first document has An a1 flag, b1, b2, … bm may indicate the position of each data page in the second document, wherein each data page in the second document has An a 69528, t 867, t 8672, N …, N of the document may indicate the position of each data page in the second document, wherein each data page in the Nth document has An mark. Specifically, the Python script calls an image recognition interface API to recognize the mixed document, put the data pages with the same second mark into a folder, and sort the data pages in the folder according to the third mark, for example, put the data pages with the same a1 mark into the first folder, and sort the data pages in the first folder according to marks such as a1, a2 … ap, and the like; putting the data pages marked by A2 into a second folder, and sorting the data pages in the second folder according to marks such as b1, b2, … bm and the like; …, data pages with the same An mark are all placed into the Nth folder, and the data pages in the Nth folder are sorted according to the marks of t1, t2, … tn, etc. Therefore, the data pages to be contained in each document can be identified and the data pages in each document can be sorted, so that the mixed document can be subjected to document separation to obtain N folders.
And 300, sequentially identifying the type of the document stored in each folder, and storing the folder into an archive directory folder corresponding to the type of the document stored in the folder.
In the technical solution of this embodiment, the working principle of the mixed document filing method is as follows: firstly, paper files of various documents to be filed can be automatically scanned by a printer to generate an electronic mixed document by manpower, the electronic mixed document scanned by the printer is obtained, the mixed document comprises N documents, each document comprises at least one data page, and N is more than or equal to 2; then calling an image recognition interface API through a Python script to recognize the mixed document, putting the data pages with the same second mark into a folder, and sequencing the data pages in the folder according to a third mark, so that the mixed document can be subjected to document separation to obtain N folders; and finally, sequentially identifying the type of the document stored in each folder, and storing the folder into an archive directory folder corresponding to the type of the document stored in the folder. Therefore, each document in the mixed document can be accurately separated, the type of each document can be identified, and the documents are stored and archived according to the types after the type of each document is identified.
Example four
Fig. 4 is a flowchart of a hybrid document archiving method provided in the fourth embodiment of the present invention, and referring to fig. 4, on the basis of the foregoing embodiment, the hybrid document archiving method includes the following specific steps:
step 100, acquiring a mixed document scanned by a printer, wherein the mixed document comprises N documents, each document comprises at least one data page, and N is more than or equal to 2.
And step 310, learning the historical mixed document to obtain a training model.
The training model can be obtained by learning historical electronic documents such as notice, issue documents, request forms, application forms, inspection data, contracts, clean and cheap agreements, technical agreements and the like.
And 320, performing document separation on the mixed document according to the training model to obtain N folders.
The method comprises the steps of inputting a mixed document to be separated into a training model, and carrying out document separation on the mixed document through the training model to obtain N folders.
And 300, sequentially identifying the type of the document stored in each folder, and storing the folder into an archive directory folder corresponding to the type of the document stored in the folder.
In the technical solution of this embodiment, the working principle of the mixed document filing method is as follows: firstly, paper files of various documents to be filed can be automatically scanned by a printer to generate an electronic mixed document by manpower, the electronic mixed document scanned by the printer is obtained, the mixed document comprises N documents, each document comprises at least one data page, and N is more than or equal to 2; then, learning the historical mixed document to obtain a training model, and performing document separation on the mixed document according to the training model to obtain N folders; and finally, sequentially identifying the type of the document stored in each folder, and storing the folder into an archive directory folder corresponding to the type of the document stored in the folder. Therefore, each document in the mixed document can be accurately separated, the type of each document can be identified, and the documents are stored and archived according to the types after the type of each document is identified.
EXAMPLE five
Fig. 5 is a flowchart of a hybrid document archiving method provided in the fifth embodiment of the present invention, and referring to fig. 5, on the basis of the foregoing embodiment, the hybrid document archiving method includes the following specific steps:
step 100, acquiring a mixed document scanned by a printer, wherein the mixed document comprises N documents, each document comprises at least one data page, and N is more than or equal to 2.
Step 200, performing document separation on the mixed document to obtain N folders, wherein each folder stores all data pages of a document.
Step 301, sequentially identifying the type of the document stored in each folder according to the elements of all data pages of a document stored in each folder.
Each folder is stored with a complete document, each document comprises all data pages of the document, elements such as characters are recorded or stored in the data pages, and the type of the document stored in each folder can be identified by identifying the elements in the data pages of each folder. The document types can include electronic documents of notification, issuance, request, application, inspection data, workload and settlement amount confirmation table, contract, technical agreement and the like.
Optionally, the elements include: at least one of a title element, a contract name element, and a settlement month element.
The title element may be a title name of each document, for example, an a item contract book, a B item contract book, an a item request book, a B item notice book, and the like. The contract name element may be a contract name for a contract type document.
It should be noted that, in general, a document includes a title, and thus, the elements include at least one of a title element, a contract name element, and a settlement month element, and at least include the title element. Therefore, when some documents do not have the same-name element and/or the settlement month element, the documents can be prevented from being identified and classified through the title element, so that classification errors can be avoided.
EXAMPLE six
Fig. 6 is a schematic structural diagram of a hybrid document filing apparatus according to a sixth embodiment of the present invention, and referring to fig. 6, the hybrid document filing apparatus 10 includes:
the mixed document acquisition module 11 is used for acquiring a mixed document scanned by the printer, wherein the mixed document comprises N parts of documents, each part of document comprises at least one data page, and N is more than or equal to 2;
the document separation module 12 is configured to perform document separation on the mixed document to obtain N folders, and each folder stores all data pages of a document;
and a document identification and storage module 13, configured to sequentially identify the type of the document stored in each folder, and store the folder in an archive directory folder corresponding to the type of the document stored in the folder.
Alternatively, the N documents are arranged in order, and the first page data page of each document is provided with a first mark.
Optionally, the document separation module 12 is configured to perform document separation on the mixed document to obtain N folders, and includes:
calling an image recognition interface API through a Python script to recognize the mixed document and obtain all data pages provided with first marks;
and putting data pages from the first data page to a previous page of the second data page into a folder, wherein the first data page and the second data page are two adjacent data pages provided with first marks, and the first data page is positioned before the second data page.
Optionally, N documents are arranged in a disordered manner or in a sequential manner, and all data pages of each document are provided with a second mark and a third mark; the second mark is used to indicate to which document the data page belongs and the third mark is used to indicate the position of the data page in the document.
Optionally, the document separation module 12 is configured to perform document separation on the mixed document to obtain N folders, and includes:
and calling an image recognition interface API through a Python script to recognize the mixed document, putting the data pages with the same second mark into a folder, and sequencing the data pages in the folder according to the third mark.
Optionally, the document separation module 12 is configured to perform document separation on the mixed document to obtain N folders, and includes:
learning the historical mixed document to obtain a training model;
and carrying out document separation on the mixed documents according to the training model to obtain N folders.
Optionally, the document identification and storage module 13 is configured to identify the type of the document stored in each folder in turn, and includes:
the type of the document stored in each folder is identified in turn based on the elements of all the data pages of a document stored in each folder.
Optionally, the elements include: at least one of a title element, a contract name element, and a settlement month element.
In an aspect of the present embodiment, there is provided a hybrid document filing apparatus including: the mixed document acquisition module is used for acquiring mixed documents scanned by the printer, wherein the mixed documents comprise N parts of documents, each part of document comprises at least one data page, and N is more than or equal to 2; the document separation module is used for carrying out document separation on the mixed document to obtain N folders, and each folder stores all data pages of one document; and the document identification and storage module is used for sequentially identifying the type of the document stored in each folder and storing the folder into the file directory folder corresponding to the type of the document stored in the folder. Therefore, each document in the mixed document can be accurately separated, the type of each document can be identified, and the documents are stored and archived according to the types after the type of each document is identified. Through the device can solve prior art and have artifical participation many, complex operation, work efficiency is low and the great problem of human cost, realized carrying out automatic segmentation, type identification and the effect of filing to mixed document.
EXAMPLE seven
An embodiment of the present invention further provides a storage medium containing computer-executable instructions, which when executed by a computer processor, perform a hybrid document archiving method, the method including:
acquiring a mixed document scanned by a printer, wherein the mixed document comprises N documents, each document comprises at least one data page, and N is more than or equal to 2;
carrying out document separation on the mixed document to obtain N folders, wherein each folder stores all data pages of a document;
and sequentially identifying the type of the document stored in each folder, and storing the folder into an archive directory folder corresponding to the type of the document stored in the folder.
Of course, the storage medium provided by the embodiment of the present invention contains computer-executable instructions, and the computer-executable instructions are not limited to the method operations described above, and may also perform related operations in the hybrid document archiving method provided by any embodiment of the present invention.
From the above description of the embodiments, it is obvious for those skilled in the art that the present invention can be implemented by software and necessary general hardware, and certainly, can also be implemented by hardware, but the former is a better embodiment in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which can be stored in a computer-readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a FLASH Memory (FLASH), a hard disk or an optical disk of a computer, and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) to execute the methods according to the embodiments of the present invention.
It should be noted that, in the embodiment of the hybrid document filing apparatus, the included units and modules are only divided according to functional logic, but are not limited to the above division as long as the corresponding functions can be implemented; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention.
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims (10)

1. A hybrid document archiving method, comprising:
acquiring a mixed document scanned by a printer, wherein the mixed document comprises N documents, each document comprises at least one data page, and N is more than or equal to 2;
carrying out document separation on the mixed document to obtain N folders, wherein each folder stores all data pages of a document;
and sequentially identifying the type of the document stored in each folder, and storing the folder into an archive directory folder corresponding to the type of the document stored in the folder.
2. The hybrid document filing method of claim 1, wherein the N documents are arranged in sequence, and a first mark is provided to a first data page of each document.
3. The method of claim 2, wherein the separating the documents into N folders comprises:
calling an image recognition interface API (application program interface) through a Python script to recognize the mixed document and obtain all data pages provided with the first marks;
and putting data pages from a first data page to a previous page of a second data page into a folder, wherein the first data page and the second data page are two adjacent data pages provided with the first mark, and the first data page is positioned before the second data page.
4. The hybrid document filing method according to claim 1, wherein the N documents are arranged in a chaotic or sequential arrangement, and all data pages of each document are provided with the second mark and the third mark; the second mark is used to indicate to which document the data page belongs, and the third mark is used to indicate the position of the data page in the document.
5. The hybrid document archiving method according to claim 4, wherein the document separating the hybrid document into N folders comprises:
calling an image recognition interface API through a Python script to recognize the mixed document, putting the data pages with the same second mark into a folder, and sequencing the data pages in the folder according to the third mark.
6. The method of claim 1, wherein the separating the documents into N folders comprises:
learning the historical mixed document to obtain a training model;
and carrying out document separation on the mixed document according to the training model to obtain N folders.
7. The hybrid document archiving method according to claim 1, wherein the sequentially identifying the type of document stored in each folder comprises:
the type of the document stored in each folder is identified in turn based on the elements of all the data pages of a document stored in each folder.
8. The hybrid document archiving method according to claim 7, wherein the element includes: at least one of a title element, a contract name element, and a settlement month element.
9. A hybrid document archive device, comprising:
the mixed document acquisition module is used for acquiring a mixed document scanned by a printer, the mixed document comprises N parts of documents, each part of document comprises at least one data page, and N is more than or equal to 2;
the document separation module is used for carrying out document separation on the mixed document to obtain N folders, and each folder stores all data pages of one document;
and the document identification and storage module is used for sequentially identifying the type of the document stored in each folder and storing the folder into an archive directory folder corresponding to the type of the document stored in the folder.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the hybrid document archiving method according to any one of claims 1 to 8.
CN202011055808.1A 2020-09-29 2020-09-29 Mixed document filing method, filing device and storage medium Pending CN112199330A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011055808.1A CN112199330A (en) 2020-09-29 2020-09-29 Mixed document filing method, filing device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011055808.1A CN112199330A (en) 2020-09-29 2020-09-29 Mixed document filing method, filing device and storage medium

Publications (1)

Publication Number Publication Date
CN112199330A true CN112199330A (en) 2021-01-08

Family

ID=74008214

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011055808.1A Pending CN112199330A (en) 2020-09-29 2020-09-29 Mixed document filing method, filing device and storage medium

Country Status (1)

Country Link
CN (1) CN112199330A (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104836931A (en) * 2014-02-10 2015-08-12 富士施乐株式会社 Image processing apparatus, image reading apparatus, and image processing method
CN105512197A (en) * 2015-11-27 2016-04-20 广州宝钢南方贸易有限公司 Digitized archiving device of documents and archiving and searching device thereof
CN106326348A (en) * 2016-08-08 2017-01-11 成都四威高科技产业园有限公司 Electronic scanning processing system and electronic scanning processing method for papery documents
CN107908745A (en) * 2017-11-16 2018-04-13 理光图像技术(上海)有限公司 Masses of Document scanning collating unit, method, medium and equipment
CN108509542A (en) * 2018-03-19 2018-09-07 合肥泓泉档案信息科技有限公司 A kind of quick filing system of archives and its archiving method
CN109977073A (en) * 2019-03-11 2019-07-05 厦门纵横集团科技股份有限公司 A kind of law court's electronics folder automation filing system and its method
CN110162764A (en) * 2018-02-12 2019-08-23 北京庖丁科技有限公司 Method for splitting, device, equipment and the medium of electronic document
CN110245112A (en) * 2019-06-21 2019-09-17 同略科技有限公司 Intelligent archive management method, system, terminal and storage medium based on AI
CN110532448A (en) * 2019-07-04 2019-12-03 平安科技(深圳)有限公司 Document Classification Method, device, equipment and storage medium neural network based
CN111079677A (en) * 2019-12-23 2020-04-28 深圳市金政软件技术有限公司 Method and system for identifying and binding electronic scanning piece

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104836931A (en) * 2014-02-10 2015-08-12 富士施乐株式会社 Image processing apparatus, image reading apparatus, and image processing method
CN105512197A (en) * 2015-11-27 2016-04-20 广州宝钢南方贸易有限公司 Digitized archiving device of documents and archiving and searching device thereof
CN106326348A (en) * 2016-08-08 2017-01-11 成都四威高科技产业园有限公司 Electronic scanning processing system and electronic scanning processing method for papery documents
CN107908745A (en) * 2017-11-16 2018-04-13 理光图像技术(上海)有限公司 Masses of Document scanning collating unit, method, medium and equipment
CN110162764A (en) * 2018-02-12 2019-08-23 北京庖丁科技有限公司 Method for splitting, device, equipment and the medium of electronic document
CN108509542A (en) * 2018-03-19 2018-09-07 合肥泓泉档案信息科技有限公司 A kind of quick filing system of archives and its archiving method
CN109977073A (en) * 2019-03-11 2019-07-05 厦门纵横集团科技股份有限公司 A kind of law court's electronics folder automation filing system and its method
CN110245112A (en) * 2019-06-21 2019-09-17 同略科技有限公司 Intelligent archive management method, system, terminal and storage medium based on AI
CN110532448A (en) * 2019-07-04 2019-12-03 平安科技(深圳)有限公司 Document Classification Method, device, equipment and storage medium neural network based
CN111079677A (en) * 2019-12-23 2020-04-28 深圳市金政软件技术有限公司 Method and system for identifying and binding electronic scanning piece

Similar Documents

Publication Publication Date Title
Papadopoulos et al. The IMPACT dataset of historical document images
CN101673256B (en) Method and system for automatically extracting article metadata information based on word flow
WO2022057707A1 (en) Text recognition method, image recognition classification method, and document recognition processing method
US9736331B2 (en) Device, system and method for identifying sections of documents
CN111737503B (en) Bill information filing method and device, computer equipment and storage medium
CN112052749A (en) Archive filing method and device, electronic equipment and computer readable storage medium
CN105335453B (en) Image file dividing method
CN109190594A (en) Optical Character Recognition system and information extracting method
CN112132710B (en) Legal element processing method and device, electronic equipment and storage medium
CN114117171A (en) Intelligent project file collecting method and system based on energized thinking
CN110909123A (en) Data extraction method and device, terminal equipment and storage medium
CN105760554A (en) Automatic filing system and method for lawsuit electronic files
CN112800949A (en) Artificial intelligence-based paper archive digital processing method, system and equipment
CN110737630A (en) Method and device for processing electronic archive file, computer equipment and storage medium
US9805258B2 (en) Systems and methods for separating documents
CN110851630A (en) Management system and method for deep learning labeled samples
CN112508000B (en) Method and equipment for generating OCR image recognition model training data
CN112199330A (en) Mixed document filing method, filing device and storage medium
CN115599885A (en) Document full-text retrieval method and device, computer equipment, storage medium and product
CN114116605A (en) Method and device for sequencing image documents based on semantic features and electronic equipment
US20050018252A1 (en) Imaging system and business methodology
CN115311671A (en) Method and system for batch electronization of paper official documents
TWM530429U (en) System capable of establishing relationship between electronic files and data
TW201810091A (en) System for establishing correlation between electronic file and data and method thereof including an operating interface, a tag identifier, an electronic file generator and a server
CN110309384B (en) Management method for classifying patent files by using dates

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination