CN112800949A - Artificial intelligence-based paper archive digital processing method, system and equipment - Google Patents

Artificial intelligence-based paper archive digital processing method, system and equipment Download PDF

Info

Publication number
CN112800949A
CN112800949A CN202110109233.5A CN202110109233A CN112800949A CN 112800949 A CN112800949 A CN 112800949A CN 202110109233 A CN202110109233 A CN 202110109233A CN 112800949 A CN112800949 A CN 112800949A
Authority
CN
China
Prior art keywords
target
artificial intelligence
electronic file
electronic
picture
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110109233.5A
Other languages
Chinese (zh)
Inventor
刘培育
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN202110109233.5A priority Critical patent/CN112800949A/en
Publication of CN112800949A publication Critical patent/CN112800949A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/137Hierarchical processing, e.g. outlines

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a method, a system and equipment for digitally processing paper archives based on artificial intelligence, wherein the method comprises the following steps: scanning a paper file to be processed by a scanner to obtain a picture electronic file set; based on an artificial intelligent visual recognition picture characteristic model, scanning and monitoring the picture electronic file set to obtain an electronic file set to be processed; processing the electronic file set to be processed through a logic algorithm to obtain a target electronic file set; based on an artificial intelligent visual identification label data model, performing piece splitting processing on the target electronic file set to obtain a plurality of target folders; packaging the target electronic files in each target folder into format electronic files in a target format; based on an artificial intelligence directory extraction model, data extraction is carried out on each format electronic file, and an electronic directory is output, so that digital processing of paper files to be processed is realized, and the efficiency of digitalization of paper files is effectively improved.

Description

Artificial intelligence-based paper archive digital processing method, system and equipment
Technical Field
The invention belongs to the technical field of file digitization, and particularly relates to a paper file digitization processing method, system and device based on artificial intelligence.
Background
The archives management is very important to which trade no matter, and along with the continuous development of the era, the archives of paper version are eliminated gradually, and electronic archives because its convenient, easily store, the characteristic of easy inquiry has replaced paper version archives gradually, then need carry out digital processing to it to paper version archives before to guarantee the ageing of archives. At present, most of paper archives are digitally processed manually, the efficiency is relatively low, and a large amount of labor cost is consumed.
Therefore, how to improve the digital processing of the paper file becomes a technical problem that needs to be solved urgently by those skilled in the art.
Disclosure of Invention
In order to at least solve the problems in the prior art, the invention provides a method, a system and equipment for digitally processing paper files based on artificial intelligence.
The technical scheme provided by the invention is as follows:
on one hand, the method for digitally processing the paper archive based on artificial intelligence comprises the following steps:
scanning a paper file to be processed by a scanner to obtain a picture electronic file set;
based on an artificial intelligent visual recognition picture characteristic model, scanning and monitoring the picture electronic file set to obtain an electronic file set to be processed;
processing the electronic file set to be processed through a logic algorithm to obtain a target electronic file set;
based on an artificial intelligent visual identification label data model, performing piece splitting processing on the target electronic file set to obtain a plurality of target folders, wherein each target folder comprises the same target electronic file;
packaging the target electronic files in each target folder into format electronic files in a target format;
and based on an artificial intelligence directory extraction model, performing data extraction on each format electronic file, and outputting an electronic directory to realize digital processing of the paper files to be processed.
Optionally, before the scanning of the paper document to be processed by the scanner to obtain the electronic image document set, the method further includes:
classifying and sorting the first preset number of picture electronic files according to different feature classifications, and making a second preset number of feature training samples, wherein the features comprise a black edge feature, a folding feature, a light display feature, a blank feature, an inclined feature and a reversed feature;
training the second preset number of feature training samples based on an artificial intelligence visual recognition algorithm to construct an artificial intelligence visual recognition picture feature model, wherein the artificial intelligence visual recognition picture feature model is further used for recognizing different types of errors and performing error feature repair according to different error features through corresponding program algorithms.
Optionally, the picture electronic file set is obtained by scanning paper files in sequence;
before the paper document to be processed is scanned by the scanner and the picture electronic document set is obtained, the method further comprises the following steps:
classifying and sorting a third preset number of picture electronic files according to different label classifications, and manufacturing a fourth preset number of label training samples, wherein the labels comprise a first page label, a middle page label and a tail page label;
training the fourth preset number of label training samples based on an artificial intelligence visual recognition algorithm, and constructing an artificial intelligence visual recognition label data model.
Optionally, the processing the target electronic file set based on the artificial intelligence visual identification tag data model to obtain a plurality of target folders includes:
based on the artificial intelligent visual identification tag data model, performing first page retrieval and last page retrieval on the target electronic file set;
and establishing a plurality of target folders according to preset rules according to the results of the first page retrieval and the tail page retrieval.
Optionally, after the creating the plurality of target folders according to the preset rule, the method further includes:
reading the text content of each picture of the target electronic file in each target folder through optical character recognition;
and analyzing the text content of each picture through artificial intelligence semantic recognition, and checking whether the target electronic files in the target folder are the same file.
Optionally, after the artificial intelligence visual identification tag data model is constructed, the method further includes:
carrying out batch processing on the picture electronic files carrying the home page labels and putting the picture electronic files into a specific sample folder through an artificial intelligent visual identification label data model;
performing model labeling on the picture electronic file carrying the home page label in the specific sample folder according to a requirement rule to obtain a fifth preset number of labeled samples, wherein the requirement rule comprises a full parcel number rule, a filing unit rule, a directory name rule, a text number rule, a accountant rule, a part number rule, a page number rule, a year rule and a date rule;
and training the fifth preset number of labeled samples based on an artificial intelligence structural algorithm to obtain an artificial intelligence directory extraction model.
Optionally, after obtaining the artificial intelligence directory extraction model, the method further includes:
newly adding and labeling the unmarked picture electronic file in the specific sample folder to obtain a verification labeling sample;
and carrying out verification training on the verification marking sample based on an artificial intelligence structural algorithm to obtain a verification model serving as the artificial intelligence directory extraction model.
Optionally, the output electronic catalog includes:
and outputting the electronic catalogues with different formats according to preset filing rules or file management software rules.
In another aspect, a system for digitally processing a paper archive, comprising:
the scanning module is used for scanning the paper file to be processed through a scanner to obtain a picture electronic file set;
the processing module is used for identifying the picture characteristic model based on artificial intelligence vision, scanning and monitoring the picture electronic file set and acquiring an electronic file set to be processed; processing the electronic file set to be processed through a logic algorithm to obtain a target electronic file set;
the splitting module is used for splitting the target electronic file set based on an artificial intelligence visual identification tag data model to obtain a plurality of target folders, and each target folder comprises the same target electronic file;
the digital generation module is used for packaging the target electronic files in each target folder into format electronic files in a target format; and based on an artificial intelligence directory extraction model, performing data extraction on each format electronic file, and outputting an electronic directory to realize digital processing of the paper files to be processed.
In yet another aspect, an apparatus for digitally processing a paper archive, comprising: a processor, and a memory coupled to the processor;
the memory is used for storing a computer program which is at least used for executing the method for digitally processing the artificial intelligence based paper archive;
the processor is used for calling and executing the computer program in the memory.
The invention has the beneficial effects that:
the invention provides a method, a system and equipment for digitally processing paper archives based on artificial intelligence, wherein the method comprises the following steps: scanning a paper file to be processed by a scanner to obtain a picture electronic file set; based on an artificial intelligent visual recognition picture characteristic model, scanning and monitoring the picture electronic file set to obtain an electronic file set to be processed; processing the electronic file set to be processed through a logic algorithm to obtain a target electronic file set; based on an artificial intelligent visual identification label data model, performing piece splitting processing on the target electronic file set to obtain a plurality of target folders, wherein each target folder comprises the same target electronic file; packaging the target electronic files in each target folder into format electronic files in a target format; based on an artificial intelligence directory extraction model, data extraction is carried out on each format electronic file, an electronic directory is output, digital processing of the paper files to be processed is achieved, all processes after scanning is completed are automatically completed by machines, only a small amount of personnel are needed for monitoring, labor cost is greatly saved, and meanwhile the efficiency of digitalization of the paper files is effectively improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flow chart of a method for digitally processing paper documents based on artificial intelligence according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of an artificial intelligence based digital processing system for paper documents according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of an apparatus for digitally processing paper files based on artificial intelligence according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be described in detail below. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the examples given herein without any inventive step, are within the scope of the present invention.
Fig. 1 is a flowchart of a method for digitally processing a paper archive based on artificial intelligence according to an embodiment of the present invention.
As shown in fig. 1, the method for digitally processing a paper archive based on artificial intelligence provided in this embodiment includes the following steps:
and S11, scanning the paper file to be processed through the scanner to obtain the picture electronic file set.
In this embodiment, a paper archive is taken as an example for explanation, but in an actual application process, the paper archive is not limited to a paper archive file, and may be another paper file, and the paper file that needs to be digitally processed may be referred to as a paper file to be processed. Because the paper files to be processed are scanned one by one, the obtained scanning results are all picture electronic files, and the picture electronic files can be called as a picture electronic file set.
And S12, scanning and monitoring the picture electronic file set based on the artificial intelligence visual recognition picture characteristic model, and acquiring the electronic file set to be processed.
And S13, processing the electronic file set to be processed through a logic algorithm to obtain a target electronic file set.
Specifically, before scanning the paper document to be processed by the scanner to obtain the picture electronic document set, the method further includes: classifying and sorting the first preset number of picture electronic files according to different feature classifications, and making a second preset number of feature training samples, wherein the features comprise a black edge feature, a folding feature, a light display feature, a blank feature, an inclined feature, a reversed feature and the like; training a second preset number of feature training samples based on an artificial intelligence visual recognition algorithm, constructing an artificial intelligence visual recognition picture feature model, recognizing different types of errors by the artificial intelligence visual recognition picture feature model, and repairing error features according to different error features and corresponding program algorithms. The method comprises the steps of training different picture characteristics to obtain a picture characteristic model, then identifying according to the picture characteristic model, scanning and monitoring a picture electronic file set, carrying out artificial intelligent visual identification on the picture electronic file set conforming to the model characteristics to detect which files need to be processed, processing the files conforming to the model characteristics needing to be processed through corresponding logics and algorithms to obtain a target electronic file set, wherein the target electronic file set achieves the effect of normal electronic files.
And S14, based on the artificial intelligent visual identification label data model, performing piece splitting processing on the target electronic file set to obtain a plurality of target folders, wherein each target folder comprises the same target electronic file.
Specifically, when scanning the paper archive files, the scanning sequence is the placing sequence of the original paper files in the archive box, so that the sequence of the obtained picture electronic files is the related sequence of the original paper files. Before scanning the paper document to be processed by the scanner to obtain the picture electronic document set, the method further comprises the following steps: classifying and sorting a third preset number of picture electronic files according to different label classifications, and manufacturing a fourth preset number of label training samples, wherein the labels comprise a first page label, a middle page label and a tail page label; training a fourth preset number of label training samples based on an artificial intelligence visual recognition algorithm, and constructing an artificial intelligence visual recognition label data model.
Specifically, based on artificial intelligence visual identification label data model, carry out the branch piece processing to target electronic file set, obtain a plurality of target folders, specifically include: based on an artificial intelligent visual identification tag data model, performing first page retrieval and last page retrieval on a target electronic file set; the artificial intelligence visual identification label data model has the main functions of labeling the electronic files of the pictures, ensuring that the position of each picture can be distinguished as a first page, a last page or a middle page, then establishing a plurality of target folders according to preset rules according to the results of the first page retrieval and the last page retrieval, namely putting a plurality of pictures of the same file into the same folder according to the preset rules, so that each folder is the same file, and the quality and the effect of the electronic files can be ensured.
After a plurality of target folders are established according to a preset rule, the method further comprises the following steps: reading the text content of each picture of the target electronic file in each target folder through optical character recognition; and analyzing the text content of each picture through artificial intelligence semantic recognition, and checking whether the target electronic files in the target folder are the same file. Semantic recognition is carried out on the picture files in the same folder through an artificial intelligence semantic recognition function, the relevance among all the pictures is determined through voice analysis, whether the files in the folder are the same or not is finally determined, repeated verification of the files is achieved, and the correctness of the files is guaranteed.
And S15, packaging the target electronic files in each target folder into format electronic files in a target format.
The splitting processing of the electronic file set is completed through the above steps, that is, the electronic file set is split into a plurality of different folders, each folder contains the same archive electronic file, and for the folders which have been split, the folders are automatically packaged into format electronic files in a target format according to the sequence of the picture electronic files in the folders, for example, the folders can be packaged into PDF files.
And S16, extracting data of each format electronic file based on the artificial intelligent directory extraction model, and outputting the electronic directory to realize digital processing of the paper file to be processed.
Specifically, after the artificial intelligence visual identification tag data model is constructed, the method further comprises the following steps: carrying out batch processing on the picture electronic files carrying the home page labels and putting the picture electronic files into a specific sample folder through an artificial intelligent visual identification label data model; carrying out model labeling on the picture electronic file carrying the home page label in the specific sample folder according to a requirement rule to obtain a fifth preset number of labeled samples, wherein the requirement rule comprises a whole number rule, a file establishing unit rule, a directory name rule, a text number rule, a accountant rule, a part number rule, a page number rule, a year rule, a date rule and the like; and training the fifth preset number of labeled samples based on an artificial intelligence structural algorithm to obtain an artificial intelligence directory extraction model. After obtaining the artificial intelligence directory extraction model, the method further comprises the following steps: newly adding and labeling the unmarked picture electronic file in the specific sample folder to obtain a verification labeling sample; and carrying out verification training on the verification marking samples based on an artificial intelligence structural algorithm to obtain a verification model serving as an artificial intelligence directory extraction model.
And after the artificial intelligence directory extraction model is obtained, performing data extraction on each format electronic file based on the artificial intelligence directory extraction model, and storing the electronic file in the database. Because each file has a directory name, a responsible person, a filing unit, date and other related attributes, data extraction is carried out on the attributes to obtain a preselected directory, then secondary processing is carried out on the preselected directory, and electronic directories suitable for different formats, including an excel format, an xml format and the like, are output according to preset filing rules or file management software rules.
The embodiment provides a paper archive digital processing method based on artificial intelligence, which comprises the following steps: scanning a paper file to be processed by a scanner to obtain a picture electronic file set; based on an artificial intelligent visual recognition picture characteristic model, scanning and monitoring the picture electronic file set to obtain an electronic file set to be processed; processing the electronic file set to be processed through a logic algorithm to obtain a target electronic file set; based on an artificial intelligent visual identification label data model, performing piece splitting processing on the target electronic file set to obtain a plurality of target folders, wherein each target folder comprises the same target electronic file; packaging the target electronic files in each target folder into format electronic files in a target format; based on an artificial intelligence directory extraction model, data extraction is carried out on each format electronic file, an electronic directory is output, digital processing of the paper files to be processed is achieved, all processes after scanning is completed are automatically completed by machines, only a small amount of personnel are needed for monitoring, labor cost is greatly saved, and meanwhile the efficiency of digitalization of the paper files is effectively improved. The whole process is automatically completed by a machine in batches and in steps from the beginning of processing, one person can be responsible for monitoring a plurality of links simultaneously, the labor cost is greatly saved, and the method has the characteristics of low cost, high efficiency, short time consumption, high precision and reusability. The whole flow of the file digitization work is comprehensively integrated and improved by combining a computer technology, artificial intelligence, a program algorithm and the like, different solutions can be output aiming at different industries, and one-time modeling can be carried out for multiple times.
Based on the same general inventive concept, the application also protects a digital processing system of the paper archives based on artificial intelligence.
Fig. 2 is a schematic structural diagram of an artificial intelligence-based paper archive digitization processing system according to an embodiment of the present invention.
As shown in fig. 2, the present embodiment provides a system for digitally processing a paper archive based on artificial intelligence, which includes:
the scanning module 10 is used for scanning the paper document to be processed through a scanner to obtain a picture electronic document set;
the processing module 20 is used for identifying the picture characteristic model based on artificial intelligence vision, scanning and monitoring the picture electronic file set and acquiring the electronic file set to be processed; processing the electronic file set to be processed through a logic algorithm to obtain a target electronic file set;
the splitting module 30 is configured to perform splitting processing on a target electronic file set based on an artificial intelligence visual identification tag data model to obtain a plurality of target folders, where each target folder includes the same target electronic file;
the digital generation module 40 is used for packaging the target electronic files in each target folder into format electronic files in a target format; and based on the artificial intelligence directory extraction model, performing data extraction on the electronic file in each format, and outputting the electronic directory to realize digital processing of the paper file to be processed.
The embodiment provides a paper archives' digital processing system based on artificial intelligence, includes: scanning a paper file to be processed by a scanner to obtain a picture electronic file set; based on an artificial intelligent visual recognition picture characteristic model, scanning and monitoring the picture electronic file set to obtain an electronic file set to be processed; processing the electronic file set to be processed through a logic algorithm to obtain a target electronic file set; based on an artificial intelligent visual identification label data model, performing piece splitting processing on the target electronic file set to obtain a plurality of target folders, wherein each target folder comprises the same target electronic file; packaging the target electronic files in each target folder into format electronic files in a target format; based on an artificial intelligence directory extraction model, data extraction is carried out on each format electronic file, an electronic directory is output, digital processing of the paper files to be processed is achieved, all processes after scanning is completed are automatically completed by machines, only a small amount of personnel are needed for monitoring, labor cost is greatly saved, and meanwhile the efficiency of digitalization of the paper files is effectively improved. The whole process is automatically completed by a machine in batches and in steps from the beginning of processing, one person can be responsible for monitoring a plurality of links simultaneously, the labor cost is greatly saved, and the method has the characteristics of low cost, high efficiency, short time consumption, high precision and reusability. The whole flow of the file digitization work is comprehensively integrated and improved by combining a computer technology, artificial intelligence, a program algorithm and the like, different solutions can be output aiming at different industries, and one-time modeling can be carried out for multiple times.
Embodiments of the apparatus parts have been described in detail in relation to corresponding method embodiments, and therefore will not be described in detail in relation to corresponding apparatus parts, which may be understood by reference to each other.
Based on the same general inventive concept, the application also protects the digital processing equipment of the paper archives based on artificial intelligence.
Fig. 3 is a schematic structural diagram of an apparatus for digitally processing paper files based on artificial intelligence according to an embodiment of the present invention.
As shown in fig. 3, the present embodiment provides an apparatus for digitally processing a paper archive based on artificial intelligence, including: a processor, and a memory coupled to the processor;
the memory is used for storing a computer program, and the computer program is at least used for executing the artificial intelligence based paper archive digital processing method of any embodiment;
the processor is used to call and execute the computer program in the memory.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.
It is understood that the same or similar parts in the above embodiments may be mutually referred to, and the same or similar parts in other embodiments may be referred to for the content which is not described in detail in some embodiments.
It should be noted that the terms "first," "second," and the like in the description of the present invention are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. Further, in the description of the present invention, the meaning of "a plurality" means at least two unless otherwise specified.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and alternate implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.
It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.
In addition, functional units in the embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.
The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims (10)

1. A paper archive digitization processing method based on artificial intelligence is characterized by comprising the following steps:
scanning a paper file to be processed by a scanner to obtain a picture electronic file set;
based on an artificial intelligent visual recognition picture characteristic model, scanning and monitoring the picture electronic file set to obtain an electronic file set to be processed;
processing the electronic file set to be processed through a logic algorithm to obtain a target electronic file set;
based on an artificial intelligent visual identification label data model, performing piece splitting processing on the target electronic file set to obtain a plurality of target folders, wherein each target folder comprises the same target electronic file;
packaging the target electronic files in each target folder into format electronic files in a target format;
and based on an artificial intelligence directory extraction model, performing data extraction on each format electronic file, and outputting an electronic directory to realize digital processing of the paper files to be processed.
2. The method for digitally processing paper documents based on artificial intelligence as claimed in claim 1, wherein before scanning the paper documents to be processed by the scanner to obtain the electronic document set, the method further comprises:
classifying and sorting the first preset number of picture electronic files according to different feature classifications, and making a second preset number of feature training samples, wherein the features comprise a black edge feature, a folding feature, a light display feature, a blank feature, an inclined feature and a reversed feature;
training the second preset number of feature training samples based on an artificial intelligence visual recognition algorithm to construct an artificial intelligence visual recognition picture feature model; the artificial intelligence visual recognition image feature model is also used for recognizing different types of errors and repairing the error features through corresponding program algorithms according to the different error features.
3. The method of claim 1, wherein the set of pictorial electronic documents is obtained by scanning paper documents sequentially;
before the paper document to be processed is scanned by the scanner and the picture electronic document set is obtained, the method further comprises the following steps:
classifying and sorting a third preset number of picture electronic files according to different label classifications, and manufacturing a fourth preset number of label training samples, wherein the labels comprise a first page label, a middle page label and a tail page label;
training the fourth preset number of label training samples based on an artificial intelligence visual recognition algorithm, and constructing an artificial intelligence visual recognition label data model.
4. The method of claim 3, wherein the step of splitting the target electronic document set to obtain a plurality of target folders based on the artificial intelligence visual identification tag data model comprises:
based on the artificial intelligent visual identification tag data model, performing first page retrieval and last page retrieval on the target electronic file set;
and establishing a plurality of target folders according to preset rules according to the results of the first page retrieval and the tail page retrieval.
5. The method for digitally processing paper archives based on artificial intelligence as claimed in claim 4, wherein after the creating of the plurality of target folders according to the preset rules, the method further comprises:
reading the text content of each picture of the target electronic file in each target folder through optical character recognition;
and analyzing the text content of each picture through artificial intelligence semantic recognition, and checking whether the target electronic files in the target folder are the same file.
6. The method of claim 3, wherein after constructing the artificial intelligence visual identification tag data model, the method further comprises:
carrying out batch processing on the picture electronic files carrying the home page labels and putting the picture electronic files into a specific sample folder through an artificial intelligent visual identification label data model;
performing model labeling on the picture electronic file carrying the home page label in the specific sample folder according to a requirement rule to obtain a fifth preset number of labeled samples, wherein the requirement rule comprises a full parcel number rule, a filing unit rule, a directory name rule, a text number rule, a accountant rule, a part number rule, a page number rule, a year rule and a date rule;
and training the fifth preset number of labeled samples based on an artificial intelligence structural algorithm to obtain an artificial intelligence directory extraction model.
7. The method of claim 6, wherein after obtaining the artificial intelligence catalog extraction model, the method further comprises:
newly adding and labeling the unmarked picture electronic file in the specific sample folder to obtain a verification labeling sample;
and carrying out verification training on the verification marking sample based on an artificial intelligence structural algorithm to obtain a verification model serving as the artificial intelligence directory extraction model.
8. The method of claim 1, wherein outputting the electronic catalog comprises:
and outputting the electronic catalogues with different formats according to preset filing rules or file management software rules.
9. A system for digitally processing paper documents, comprising:
the scanning module is used for scanning the paper file to be processed through a scanner to obtain a picture electronic file set;
the processing module is used for identifying the picture characteristic model based on artificial intelligence vision, scanning and monitoring the picture electronic file set and acquiring an electronic file set to be processed; processing the electronic file set to be processed through a logic algorithm to obtain a target electronic file set;
the splitting module is used for splitting the target electronic file set based on an artificial intelligence visual identification tag data model to obtain a plurality of target folders, and each target folder comprises the same target electronic file;
the digital generation module is used for packaging the target electronic files in each target folder into format electronic files in a target format; and based on an artificial intelligence directory extraction model, performing data extraction on each format electronic file, and outputting an electronic directory to realize digital processing of the paper files to be processed.
10. A device for digital processing of paper documents, comprising: a processor, and a memory coupled to the processor;
the memory is used for storing a computer program at least for executing the method for the digital processing of artificial intelligence based paper archives according to any one of claims 1 to 8;
the processor is used for calling and executing the computer program in the memory.
CN202110109233.5A 2021-01-27 2021-01-27 Artificial intelligence-based paper archive digital processing method, system and equipment Pending CN112800949A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110109233.5A CN112800949A (en) 2021-01-27 2021-01-27 Artificial intelligence-based paper archive digital processing method, system and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110109233.5A CN112800949A (en) 2021-01-27 2021-01-27 Artificial intelligence-based paper archive digital processing method, system and equipment

Publications (1)

Publication Number Publication Date
CN112800949A true CN112800949A (en) 2021-05-14

Family

ID=75812055

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110109233.5A Pending CN112800949A (en) 2021-01-27 2021-01-27 Artificial intelligence-based paper archive digital processing method, system and equipment

Country Status (1)

Country Link
CN (1) CN112800949A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113377952A (en) * 2021-05-26 2021-09-10 长江勘测规划设计研究有限责任公司 Automatic generation method for filing number of electronic file for quality test of water conservancy and hydropower engineering
CN114708127A (en) * 2022-04-15 2022-07-05 广东南粤科教研究院 Student point system comprehensive assessment method and system
CN116503889A (en) * 2023-01-18 2023-07-28 苏州工业园区航星信息技术服务有限公司 File and electronic file processing method, device, equipment and storage medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113377952A (en) * 2021-05-26 2021-09-10 长江勘测规划设计研究有限责任公司 Automatic generation method for filing number of electronic file for quality test of water conservancy and hydropower engineering
CN114708127A (en) * 2022-04-15 2022-07-05 广东南粤科教研究院 Student point system comprehensive assessment method and system
CN116503889A (en) * 2023-01-18 2023-07-28 苏州工业园区航星信息技术服务有限公司 File and electronic file processing method, device, equipment and storage medium
CN116503889B (en) * 2023-01-18 2024-01-19 苏州工业园区航星信息技术服务有限公司 File and electronic file processing method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN112800949A (en) Artificial intelligence-based paper archive digital processing method, system and equipment
US7697757B2 (en) Computer assisted document modification
CN111144370B (en) Document element extraction method, device, equipment and storage medium
CN112052749A (en) Archive filing method and device, electronic equipment and computer readable storage medium
CN107291949B (en) Information searching method and device
CN110909123B (en) Data extraction method and device, terminal equipment and storage medium
CN112882947A (en) Interface test method, device, equipment and storage medium
CN114359533B (en) Page number identification method based on page text and computer equipment
CN110532449B (en) Method, device, equipment and storage medium for processing service document
CN115116068A (en) Archive intelligent filing system based on OCR
TWM590730U (en) Document management system base on AI
CN111652272B (en) Image processing method and device, computer equipment and storage medium
CN115809649A (en) eCTD conversion method, system and storage medium for NeeS electronic document
CN112364790B (en) Airport work order information identification method and system based on convolutional neural network
CN113609825A (en) Intelligent customer attribute tag identification method and device
CN112328246A (en) Page component generation method and device, computer equipment and storage medium
Ondrejcek et al. Information extraction from scanned engineering drawings
US20230385298A1 (en) Method and apparatus of extracting, storing, and querying structured data from documents and images using computer vision
CN114035726B (en) Method and system for robot flow automatic page element identification process
CN117493712A (en) PDF document navigable directory extraction method and device, electronic equipment and storage medium
US12008830B2 (en) System for template invariant information extraction
CN117115819A (en) Target field extraction method, system, terminal and medium
JP2011150436A (en) Method for substituting character data
Kurhekar et al. Automated text and tabular data extraction from scanned document images
Tigora A document image analysis system for educational purposes

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination