CN116612484A - File digital processing system - Google Patents

File digital processing system Download PDF

Info

Publication number
CN116612484A
CN116612484A CN202310602409.XA CN202310602409A CN116612484A CN 116612484 A CN116612484 A CN 116612484A CN 202310602409 A CN202310602409 A CN 202310602409A CN 116612484 A CN116612484 A CN 116612484A
Authority
CN
China
Prior art keywords
image
file
module
archive
scanning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310602409.XA
Other languages
Chinese (zh)
Inventor
唐婷婷
曾伟华
张伟
迟钰沛
宁方刚
陈兆亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Software Co Ltd
Original Assignee
Inspur Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Software Co Ltd filed Critical Inspur Software Co Ltd
Priority to CN202310602409.XA priority Critical patent/CN116612484A/en
Publication of CN116612484A publication Critical patent/CN116612484A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • G06Q10/103Workflow collaboration or project management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4053Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02WCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO WASTEWATER TREATMENT OR WASTE MANAGEMENT
    • Y02W90/00Enabling technologies or technologies with a potential or indirect contribution to greenhouse gas [GHG] emissions mitigation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Multimedia (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Human Computer Interaction (AREA)
  • Economics (AREA)
  • Artificial Intelligence (AREA)
  • Processing Or Creating Images (AREA)

Abstract

The invention discloses a file digital processing system, belongs to the technical field of file management, and aims to solve the technical problem of how to realize file digital processing so as to realize high sharing of information resources. The system comprises a task management module, an image scanning module, an image processing module, an image quality inspection module, a disassembling management module, a file writing module and a result management module, wherein a system administrator establishes batch tasks, introduces files in batches and distributes processing personnel and the number of files for the files. After the original document is hung, splitting a plurality of jpg pictures, or scanning paper version images by using a scanner, and performing OCR (optical character recognition) on the pictures through image processing or image quality inspection. After OCR recognition is completed, the images and OCR recognition content are synthesized into a double-layer OFD file, zip compression packets are generated according to requirements aiming at files passing quality inspection, and other service systems are handed over.

Description

File digital processing system
Technical Field
The invention relates to the technical field of file management, in particular to a file digital processing system.
Background
The traditional file management is stored in a file room in a paper file mode, so that the number of files is up to millions of pages, the file inquiry and the file retrieval are very inconvenient, and the paper files are easy to damage when frequently turned over; the long storage time can cause the fuzzy content of the file, and the real-time analysis of the file data can not be performed. The most obvious difference with the traditional file management work is that the file digital processing system fuses computer technology, scanning technology, image processing technology, OCR recognition technology, database technology, storage technology, secret encryption technology and the like, converts file resources of various carriers into digital file information, stores the digital file information in a digital form, purposefully and purposefully processes and manages the file information resources through the computer system, forms a file information resource database with an ordered structure, and provides utilization for various data demands and realizes high sharing of the information resources.
How to realize digital processing of files so as to realize high sharing of information resources is a technical problem to be solved.
Disclosure of Invention
The technical task of the invention is to provide a file digital processing system aiming at the defects, so as to solve the technical problem of how to realize file digital processing and realize high sharing of information resources.
The invention relates to a file digital processing system, which comprises a task management module, an image scanning module, an image processing module, an image quality inspection module, a disassembling management module, a file writing module and a result management module;
the task management module is interacted outwards through a task management interface, is used for acquiring a configured metadata copybook template, is used for importing file data, setting processing links, and distributing file processing personnel and file processing quantity for each processing link;
the image scanning module is used for carrying out external interaction through an image scanning interface, calling the scanner to carry out image scanning on the file to form a scanning image, and sending the scanning image as a scanning original image to the image processing module;
the image processing module is used for carrying out external interaction through an image processing interface, carrying out image processing on the original scanning images to obtain modified images, recording the modification information of each original scanning image, and sending the original scanning images and the modified images to the image quality inspection module;
the image quality inspection module is used for carrying out image quality inspection in a mode of comparing a scanning original image with a modified image through an image quality inspection interface, sending the modified image as a file image to the unpacking management module for files passing the quality inspection, and carrying out problem marking and calling the image scanning module or the image processing module for files not passing the quality inspection;
the separate management module is used for calling ORC client service through a separate management interface, performing ORC recognition on the received file image, extracting text content, splitting files according to the obtained ORC recognition result, classifying the file content into corresponding folders, wherein each folder comprises a corresponding image and an ORC result;
the file writing module is used for writing file metadata based on a metadata writing template through external interaction of a file writing interface, identifying images and OCR results in a folder and filling the images and the OCR results into corresponding metadata to obtain metadata information, calling an OFD service, and combining and converting the images containing coordinates, the OCR results and the corresponding images into a double-layer OFD file;
the result management module is used for carrying out external interaction through a result management interface, detecting metadata information, archive images and OFD files based on configured detection rules, generating detection reports, packaging the metadata information, the archive images and the OFD files into archive information packages, and sending the archive information packages to the service system.
Preferably, the task management module is used for acquiring the configured metadata copybook template according to the categories, the arrangement mode, the carrier mode and the metadata scheme selected by the user;
the task management module is used for supporting manual entry and configuration of the metadata scheme template for importing archive data in batches.
Preferably, the image scanning module is used for calling the scanner to scan the file page by page, and is used for scanning the file in the following way: after the current document is scanned, the scanned image is submitted to the server by triggering the next button in the image scanning interface, and the scanning of the next document is switched.
Preferably, the image processing interface is configured with a one-key processing button, and for each scanning original image, the image processing is carried out on the scanning original image by triggering the one-key processing button to obtain a modified image, and the modification information is recorded;
wherein the image processing provides the following services: selecting an image, cutting the image, rotating the image, removing black edges, correcting errors and super-resolution.
Preferably, the disassembly management module is configured to perform the following:
converting the received archive image into a jpg image, calling an OCR service to perform OCR recognition on the jpg image, and extracting an OCR result containing coordinates, wherein the OCR result is a txt format file;
splitting the jpg image and the corresponding OCR result into corresponding folders according to the file structure, wherein the folders comprise texts, issuing sheets, drafts, attachments and covers;
and (3) manually rechecking the file after the file is disassembled, and manually adjusting the file which is judged to be in error.
Preferably, the disassembly management module is used for supporting disassembly of files according to two modes of a part level and a volume level.
Preferably, based on the images in the folder and the OCR results, the archive bibliographic page supports manual entry of the OCR results to fill metadata and supports framing of the image area, and the content of the image area is recognized through the OCR service and filled into corresponding metadata;
and the archive copybook page displays each page of OCR original text in the archive through the archive file frame body in a switching way, and supports manual OCR recognition on a single-page archive or a whole archive if the result display effect of the OCR original text does not meet the expectation or is omitted.
Preferably, the achievement management module is configured to perform the following:
setting a detection rule, detecting metadata information, archive images and OFD files according to archive metadata constraint, image standard requirements and whether metadata are matched with images, and generating a detection report; if the file is unqualified, returning the file to the processing link;
if the detection is qualified, generating an xml format metadata information file according to the archiving requirement through the detected archive, packaging an archive image, the xml format metadata information file and an OFD file corresponding to each folder into an archiving information packet in a zip compression format, and sending the archiving information packet to a service system;
the storage module is used for providing database storage and file storage, and storing data and files generated by the task management module, the image scanning module, the image processing module, the image quality inspection module, the unpacking management module, the archive recording module and the achievement management module, wherein the data and files comprise metadata information, archive images, OFD files and detection reports.
The file digital processing system has the following advantages:
1. different files are supported, corresponding processing flows can be configured for different batches, processing links and processing quantity are distributed for each processing person, processing record information of each processing person is recorded and tracked, and the system reasonably arranges the distribution of processing tasks according to actual conditions;
2. after the paper file result is scanned, the blurred image is converted into a high-definition image through image processing, so that the OCR recognition accuracy is improved;
3. the OCR input is supported, and the content in the image area can be identified by selecting the image area in a frame mode, such as title, responsible person, document and the like, so that the identification speed is high and the accuracy is high;
4. the paper file is scanned, processed, inspected, and recognized by OCR, the processed image and the document with the coordinate txt generated by OCR are generated into a double-layer OFD file, so that accurate content retrieval is facilitated, the document content is not easy to falsify, and the safety of the document information is ensured;
5. detecting metadata constraint and image standard requirements of the filing data packet, packaging zip compression packets according to the configured filing data packet for the files passing detection, and transferring different service systems to achieve the aim of sharing file information resources;
6. the system can reconstruct low-definition archives into high-definition archives, uses OCR recognition technology to carry out text recognition on digital archives, and improves archives data quality by manually supplementing missing data, correcting error data and eliminating repeated data, so that the historical archives are ensured to be handed over to a service system with the latest archiving standard.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments or the description of the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.
The invention is further described below with reference to the accompanying drawings.
FIG. 1 is a block diagram of a digital processing system for files according to an embodiment;
FIG. 2 is a flowchart of an embodiment of a digitized file processing system for cleaning digitized copies of a history file;
FIG. 3 is a block diagram of a digitized file processing system for processing files according to a web quality.
Detailed Description
The invention will be further described with reference to the accompanying drawings and specific examples, so that those skilled in the art can better understand the invention and implement it, but the examples are not meant to limit the invention, and the technical features of the embodiments of the invention and the examples can be combined with each other without conflict.
The embodiment of the invention provides a file digital processing system which is used for solving the technical problem of how to realize file digital processing so as to realize high sharing of information resources.
Example 1:
the invention relates to a file digital processing system which comprises a task management module, an image scanning module, an image processing module, an image quality inspection module, a disassembling management module, a file writing module, a result management module and a storage module.
The task management module is interacted with the outside through a task management interface, is used for acquiring the configured metadata copybook template, is used for importing file data, setting processing links, and distributing file processing personnel and file processing quantity for each processing link.
In this embodiment, the task management module is configured to obtain a configured metadata bibliographic template according to a category, a sorting mode, a carrier mode and a metadata scheme selected by a user; the archive data is imported in batches in a mode of supporting manual entry and configuration of an import metadata scheme template.
The system user must newly create a batch to start the task, and the task management module in this embodiment executes the following steps: acquiring a configured metadata copybook template according to the categories, the arrangement modes, the carrier modes and the metadata schemes selected by the user; the file information can be manually input, or the file batch import is carried out by configuring an import metadata scheme template, file processing links are set after the file data import is completed, and file processing personnel and file processing quantity are distributed for each link.
The image scanning module is interacted with the outside through an image scanning interface and is used for calling the scanner to scan the file to form a scanning image and sending the scanning image to the image processing module as a scanning original image.
In this embodiment, the image scanning module is configured to invoke the scanner to scan the file page by page, and is configured to scan the file in the following manner: after the current document is scanned, the scanned image is submitted to the server by triggering the next button in the image scanning interface, and the scanning of the next document is switched.
The image processing module is used for carrying out external interaction through the image processing interface, carrying out image processing on the scanning original pictures to obtain modified images, recording the modification information of each scanning original picture, and sending the scanning original pictures and the modified images to the image quality inspection module.
The image processing interface in the embodiment is configured with a one-key processing button, and for each scanning original image, the one-key processing button is triggered to perform image processing on the scanning original image to obtain a modified image, and modification information is recorded; wherein the image processing provides the following services: selecting an image, cutting the image, rotating the image, removing black edges, correcting errors and super-resolution.
In the implementation process, the system calls the image processing client to process the image, provides the capability of one-key automatic processing according to the image quality, independently records the modification information of each picture, and can gradually return to the original state of modification. The image processing function comprises the functions of image area selection, image clipping, image rotation, black edge removal, correction, super resolution and the like.
The image quality inspection module is used for carrying out image quality inspection in a mode of comparing the original scanning image with the modified image through an image quality inspection interface, sending the modified image as a file image to the unpacking management module for files passing the quality inspection, and carrying out problem marking and calling the image scanning module or the image processing module for files not passing the quality inspection.
Based on the image quality inspection module, image quality inspection link handling personnel perform manual detection on image quality at an image quality inspection interface, and unqualified files can be marked by comparing the effect of scanning original images and processed images and can be returned to an image scanning or image processing link according to marking problems.
The separate management module is used for externally interacting through a separate management interface, and is used for calling the ORC client service, carrying out ORC recognition on the received file images, extracting text contents, carrying out file splitting according to the obtained ORC recognition results, classifying the file contents into corresponding folders, and each folder comprises corresponding images and ORC results.
The component management module in this embodiment is configured to perform the following:
(1) Converting the received archive image into a jpg image, calling an OCR service to perform OCR recognition on the jpg image, and extracting an OCR result containing coordinates, wherein the OCR result is a txt format file;
(2) Splitting the jpg image and the corresponding OCR result into corresponding folders according to the file structure, wherein the folders comprise texts, issuing sheets, drafts, attachments and covers;
(3) And (3) manually rechecking the file after the file is disassembled, and manually adjusting the file which is judged to be in error.
The disassembly management module is used for supporting disassembly of files according to a part level and a volume level.
The archive writing module is used for writing archive metadata based on the metadata writing template, identifying images and OCR results in the folder and filling the images and the OCR results into corresponding metadata to obtain metadata information, calling an OFD service, and combining and converting the images containing coordinates, the OCR results and the corresponding images into a double-layer OFD file.
Based on the images and OCR results in the folder, the archive bibliographic page supports manual entry of the OCR results to populate metadata and supports framing of the image areas, and the content of the image areas is recognized through OCR services and populated to corresponding metadata; and the archive copybook page displays each page of OCR original text in the archive through the archive file frame body in a switching way, and supports manual OCR recognition on a single-page archive or a whole archive if the result display effect of the OCR original text does not meet the expectation or is omitted.
Based on the archive authoring module, a user can carry out batch archive metadata authoring, manually input is realized on a metadata input interface, or an image in an OCR recognition area is carried out through selecting an image area by a frame, and the corresponding metadata is automatically filled. In order to facilitate the archival writing, the archive frame body in the center of the page can be switched to display each page of OCR original text. The OCR effect is poor or the situation of missing OCR is found, and the OCR can be manually carried out on a single page or a whole file. And (3) completing file writing and transferring OFD service, and merging and converting the OCR result containing coordinates and the picture into a double-layer OFD file.
The result management module is used for carrying out external interaction through a result management interface, detecting metadata information, archive images and OFD files based on configured detection rules, generating detection reports, packaging the metadata information, the archive images and the OFD files into archive information packages, and sending the archive information packages to the service system.
In this embodiment, the result management module is configured to perform the following:
(1) Setting a detection rule, detecting metadata information, archive images and OFD files according to archive metadata constraint, image standard requirements and whether metadata are matched with images, and generating a detection report; if the file is unqualified, returning the file to the processing link;
(2) And if the detection is qualified, generating a metadata information file in an xml format according to the archiving requirement through the detected archive, packaging an archive image, the metadata information file in the xml format and the OFD file corresponding to each folder into an archiving information packet in a zip compression format, and transmitting the archiving information packet to a service system.
The storage module is used for providing database storage and file storage, and storing data and files generated by the task management module, the image scanning module, the image processing module, the image quality inspection module, the unpacking management module, the archive recording module and the achievement management module, wherein the data and files comprise metadata information, archive images, OFD files and detection reports.
As shown in fig. 2, the system of the present embodiment can clean the digitized copy of the history file, and the flow is as follows:
(1) Creating processing batches by system management, distributing processing links for the batches, importing file information into the newly created batches in batches, and meanwhile batching the files to specific processing link personnel;
(2) Adding file catalog metadata and corresponding file image texts, importing files under a cleaning appointed path through the texts, splitting jpg picture services by adopting a file digital processing system, and splitting tif or pdf file texts into jpg format pictures;
(3) Performing OCR (optical character recognition) processing on the split file into jpg pictures to extract text content information, and if the pictures cannot be subjected to character recognition, performing automatic high-definition picture processing on the file, and re-performing OCR;
(4) According to the text information content, automatically judging that the split jpg pictures are respectively identified to be file folder structures such as texts, issuing sheets, drafts, attachments or covers and the like, and putting the file folder structures into corresponding file folders; the automatic system disassembly result is rechecked by a processing personnel at the disassembly management module, and file disassembly with error judgment is adjusted; identifying the split jpg picture and ocr to generate a txt format file, calling a synthetic double-layer OFD service, and synthesizing the split folder into an OFD document structure;
(5) And the result management module generates an OFD folder structure according to the configured package file structure, generates an archive basic metadata information xml file according to the configured metadata scheme, and performs format conversion on the archive picture completed by the disassembly. Detecting the result by file acceptance personnel, and confirming whether acceptance conditions are met; and after the acceptance, the zip format archive data packet is handed over to the service system.
The processing flow chart according to the roll paper quality edition file is shown in fig. 3, and the flow chart is as follows:
(1) Creating a new batch and importing file information by a system administrator, distributing image processing, image quality inspection, disassembling management, file writing and result management processing links to appointed personnel, and submitting files to an image scanning link;
(2) A corresponding link processor enters an image scanning connection scanner to scan the paper file, the paper scanning is completed and submitted to an image processing link, and a digital processing system automatically performs operations such as automatic correction, automatic black edge removal, blank page deletion and the like on the image;
(3) After the image processing is finished, submitting the file subjected to the image processing to an image quality inspection link, comparing the scanned original image with the processed image, returning the unqualified image to the image processing or image scanning link, submitting the file subjected to the image quality inspection to a unpacking management link, and performing OCR (optical character recognition) and automatic unpacking on the processed image;
(4) Entering a disassembling management to check an automatic disassembling effect, wherein link processing personnel can manually disassemble the part;
(5) After the disassembling is completed, submitting the disassembled file to a file writing link, supplementing file information, intercepting the content of an image area by a user, performing OCR (optical character recognition) and automatically filling the content into corresponding metadata, so that manual input is reduced;
(6) Completing the file, and synthesizing the processed image and the txt file recognized by OCR into a double-layer OFD file format;
(7) And generating basic metadata xml and a folder structure for files meeting detection requirements, and filing zip file packages and transferring to a service system.
In the system of this embodiment, a system administrator creates a batch task, imports a batch of files, and allocates a processing person and the number of files to the files. After the original document is hung, splitting a plurality of jpg pictures, or scanning paper version images by using a scanner, and performing OCR (optical character recognition) on the pictures through image processing or image quality inspection. After OCR recognition is completed, the images and OCR recognition content are synthesized into a double-layer OFD file, zip compression packets are generated according to requirements aiming at files passing quality inspection, and other service systems are handed over.
The system of the embodiment uses an image processing technology to carry out a series of processing on the pictures, carries out image processing such as automatic image correction, blank page deletion, low-definition image to high-definition image conversion and the like on scanned pictures and split jpg pictures, records information of an image processing process, and improves the accuracy of OCR recognition.
The system of the embodiment automatically identifies file folder directories of scanned pictures or processed images by using an OCR (optical character recognition) technology, and classifies the file folder directories according to different types of folder structures; and selecting a designated image area in the file bibliographic link frame by using an OCR (optical character recognition) technology, automatically recognizing designated file metadata fields, and reducing manual input.
The system of the embodiment uses the OFD service to synthesize the double-layer OFD file for accurately searching the content for the processed image and the txt document generated after OCR recognition.
While the invention has been illustrated and described in detail in the drawings and in the preferred embodiments, the invention is not limited to the disclosed embodiments, and it will be appreciated by those skilled in the art that the code audits of the various embodiments described above may be combined to produce further embodiments of the invention, which are also within the scope of the invention.

Claims (8)

1. The file digital processing system is characterized by comprising a task management module, an image scanning module, an image processing module, an image quality inspection module, a disassembling management module, a file writing module, a result management module and a storage module;
the task management module is interacted outwards through a task management interface, is used for acquiring a configured metadata copybook template, is used for importing file data, setting processing links, and distributing file processing personnel and file processing quantity for each processing link;
the image scanning module is used for carrying out external interaction through an image scanning interface, calling the scanner to carry out image scanning on the file to form a scanning image, and sending the scanning image as a scanning original image to the image processing module;
the image processing module is used for carrying out external interaction through an image processing interface, carrying out image processing on the original scanning images to obtain modified images, recording the modification information of each original scanning image, and sending the original scanning images and the modified images to the image quality inspection module;
the image quality inspection module is used for carrying out image quality inspection in a mode of comparing a scanning original image with a modified image through an image quality inspection interface, sending the modified image as a file image to the unpacking management module for files passing the quality inspection, and carrying out problem marking and calling the image scanning module or the image processing module for files not passing the quality inspection;
the separate management module is used for calling ORC client service through a separate management interface, performing ORC recognition on the received file image, extracting text content, splitting files according to the obtained ORC recognition result, classifying the file content into corresponding folders, wherein each folder comprises a corresponding image and an ORC result;
the file writing module is used for writing file metadata based on a metadata writing template through external interaction of a file writing interface, identifying images and OCR results in a folder and filling the images and the OCR results into corresponding metadata to obtain metadata information, calling an OFD service, and combining and converting the OCR results containing coordinates and the corresponding images into a double-layer OFD file;
the result management module is used for carrying out external interaction through a result management interface, detecting metadata information, archive images and OFD files based on configured detection rules, generating detection reports, packaging the metadata information, the archive images and the OFD files into archive information packages, and sending the archive information packages to the service system;
the storage module is used for providing database storage and file storage, and storing data and files generated by the task management module, the image scanning module, the image processing module, the image quality inspection module, the unpacking management module, the archive recording module and the achievement management module, wherein the data and files comprise metadata information, archive images, OFD files and detection reports.
2. The archive digital processing system of claim 1, wherein the task management module is configured to obtain the configured metadata bibliographic template according to a category, a sort, a carrier mode, and a metadata scheme selected by a user;
the task management module is used for supporting manual entry and configuration of the metadata scheme template for importing archive data in batches.
3. A archival digital processing system according to claim 1, wherein the image scanning module is configured to invoke the scanner to scan the archival page by page for scanning the archival by: after the current document is scanned, the scanned image is submitted to the server by triggering the next button in the image scanning interface, and the scanning of the next document is switched.
4. A digitized archive processing system according to claim 1 wherein said image processing interface is provided with a one-touch processing button, and for each scanned artwork, image processing is performed on said scanned artwork by triggering the one-touch processing button to obtain a modified post-image and recording modification information;
wherein the image processing provides the following services: selecting an image, cutting the image, rotating the image, removing black edges, correcting errors and super-resolution.
5. A digital archive processing system according to claim 1, wherein the component management module is configured to perform the following:
converting the received archive image into a jpg image, calling an OCR service to perform OCR recognition on the jpg image, and extracting an OCR result containing coordinates, wherein the OCR result is a txt format file;
splitting the jpg image and the corresponding OCR result into corresponding folders according to the file structure, wherein the folders comprise texts, issuing sheets, drafts, attachments and covers;
and (3) manually rechecking the file after the file is disassembled, and manually adjusting the file which is judged to be in error.
6. A archive digital processing system according to claim 5, wherein the archive management module is adapted to support archive stripping at both the component level and the volume level.
7. A archival digital processing system according to claim 1, wherein the archival bibliographic page supports manual entry of OCR results to populate metadata based on images in folders and OCR results and supports framing of image areas, recognizing content of image areas by OCR services and populating corresponding metadata;
and the archive copybook page displays each page of OCR original text in the archive through the archive file frame body in a switching way, and supports manual OCR recognition on a single-page archive or a whole archive if the result display effect of the OCR original text does not meet the expectation or is omitted.
8. The archive digital processing system of claim 1, wherein the effort management module is configured to perform the following:
setting a detection rule, detecting metadata information, archive images and OFD files according to archive metadata constraint, image standard requirements and whether metadata are matched with images, and generating a detection report; if the file is unqualified, returning the file to the processing link;
and if the detection is qualified, generating a metadata information file in an xml format according to the archiving requirement through the detected archive, packaging an archive image, the metadata information file in the xml format and the OFD file corresponding to each folder into an archiving information packet in a zip compression format, and transmitting the archiving information packet to a service system.
CN202310602409.XA 2023-05-26 2023-05-26 File digital processing system Pending CN116612484A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310602409.XA CN116612484A (en) 2023-05-26 2023-05-26 File digital processing system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310602409.XA CN116612484A (en) 2023-05-26 2023-05-26 File digital processing system

Publications (1)

Publication Number Publication Date
CN116612484A true CN116612484A (en) 2023-08-18

Family

ID=87674345

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310602409.XA Pending CN116612484A (en) 2023-05-26 2023-05-26 File digital processing system

Country Status (1)

Country Link
CN (1) CN116612484A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116775566A (en) * 2023-08-23 2023-09-19 福建福清核电有限公司 Method, device and system for archiving electronic files and electronic equipment
CN117251526A (en) * 2023-09-06 2023-12-19 上海云思智慧信息技术有限公司 Conference file digital management system, method and electronic equipment

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116775566A (en) * 2023-08-23 2023-09-19 福建福清核电有限公司 Method, device and system for archiving electronic files and electronic equipment
CN116775566B (en) * 2023-08-23 2023-10-31 福建福清核电有限公司 Method, device and system for archiving electronic files and electronic equipment
CN117251526A (en) * 2023-09-06 2023-12-19 上海云思智慧信息技术有限公司 Conference file digital management system, method and electronic equipment

Similar Documents

Publication Publication Date Title
CN116612484A (en) File digital processing system
US5889896A (en) System for performing multiple processes on images of scanned documents
US8014039B2 (en) Document management system, a document management method, and a document management program
US7340112B2 (en) Labeling system and methodology
EP2668571B1 (en) Document workflow architecture
US7007231B2 (en) Document management system employing multi-zone parsing process
CN112052749A (en) Archive filing method and device, electronic equipment and computer readable storage medium
WO2008058871A1 (en) Automated generation of form definitions from hard-copy forms
US10970534B2 (en) Document processing system capture flow compiler
JPH0683879A (en) Method and device for labelling document for preservation, handling and introduction
US20030063326A1 (en) Document registration system, method threreof, program thereof and storage medium thereof
CN111126952A (en) Electronic file filing processing system and method
Sankar et al. Digitizing a million books: Challenges for document analysis
JP2022170175A (en) Information processing apparatus, information processing method, and program
CN110266906A (en) The intelligent digitalized processing flowing water method of archives, system, terminal and storage medium
US8873110B2 (en) Host apparatus to generate workform, workform management server to edit an image, workform management system, and method of editing an image using a workform
US6810136B2 (en) System and method for automatic preparation of data repositories from microfilm-type materials
CN112464907A (en) Document processing system and method
US20060204141A1 (en) Method and system of converting film images to digital format for viewing
CN113221886A (en) Character learning and proofreading system based on image-text recognition
KR20020054702A (en) IMT-2000 utilization a character cognition means
CN201570029U (en) Information resources collection and management system based on business rule repository
CN111046864A (en) Method and system for automatically extracting five elements of contract scanning piece
US20240323306A1 (en) Information processing apparatus, control method for information processing apparatus, and storage medium
US20030079184A1 (en) Dynamic image storage using domain-specific compression

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination