CN112052749A - Archive filing method and device, electronic equipment and computer readable storage medium - Google Patents

Archive filing method and device, electronic equipment and computer readable storage medium Download PDF

Info

Publication number
CN112052749A
CN112052749A CN202010840577.9A CN202010840577A CN112052749A CN 112052749 A CN112052749 A CN 112052749A CN 202010840577 A CN202010840577 A CN 202010840577A CN 112052749 A CN112052749 A CN 112052749A
Authority
CN
China
Prior art keywords
archive
archived
file
image
type
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010840577.9A
Other languages
Chinese (zh)
Inventor
高明
丁诗璟
沈文俊
余刚
万聪
李亮
沈冰华
赵琴
胡德清
刘维安
欧阳明
李金灵
袁园
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Construction Bank Corp
Original Assignee
China Construction Bank Corp
CCB Finetech Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Construction Bank Corp, CCB Finetech Co Ltd filed Critical China Construction Bank Corp
Priority to CN202010840577.9A priority Critical patent/CN112052749A/en
Publication of CN112052749A publication Critical patent/CN112052749A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/51Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/5846Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using extracted text
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/93Document management systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/416Extracting the logical structure, e.g. chapters, sections or page numbers; Identifying elements of the document, e.g. authors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Library & Information Science (AREA)
  • Software Systems (AREA)
  • Business, Economics & Management (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the application provides a method and a device for archiving files, electronic equipment and a computer readable storage medium. The method comprises the following steps: acquiring an archive image of an archive to be archived; determining the archive type of an archive to be archived and an archive file in the archive to be archived based on the archive image; extracting bibliographic information from a character recognition result of the file image based on the file type; determining directory information based on the archive file; the archive image is archived based on the archive type, bibliographic information, and directory information. Based on the scheme, automatic classification and automatic file recording of the files can be realized, manual operation is replaced, higher accuracy and processing efficiency are achieved, and a foundation is laid for realization of electronization of the paper files and knowledge and high-order utilization of the electronic files.

Description

Archive filing method and device, electronic equipment and computer readable storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method and an apparatus for archiving an archive, an electronic device, and a computer-readable storage medium.
Background
Financial archives contain all kinds of archives such as document, accounting archives, acoustic image archives, material object archives, credit card archives, credit archives, capital construction archives, in order to realize the digital management of archives, the archives administrator is at the arrangement in-process of archives, involves a large amount of books and catalogue classification work, and present archives digital flow generally contains: sorting, uncoiling, scanning, image processing, data additional recording, data checking, binding reduction, digital result quality inspection, arranging file numbering and cataloguing, and finally carrying out data archiving. At present, although the electronization of paper archives can be realized through an image scanning system, the archives are complicated and changeable in attributes and large in data volume, so that the recording and classification of the archives still need a large amount of manual operation, and errors are easy to occur.
With the rapid development of financial services, the data volume of archives is larger and larger, the archives are used as carriers of value data, the effect of the archives in each service is more and more obvious, and the utilization requirements of each large financial institution on the archives are more and more vigorous. The digitization of archives is made in order to put in each big financial institution the urgent difficult problem that awaits solution, and the urgent needs of archives worker improve the automation processing level of archives record, reduce the work load of manual work arrangement and record, realize paper archives electronization, establish the basis for the knowledge and the high-order of electronic archives and utilize.
Disclosure of Invention
The present application aims to solve at least one of the above technical drawbacks. The technical scheme adopted by the application is as follows:
in a first aspect, an embodiment of the present application provides an archive method for an archive, where the method includes:
acquiring an archive image of an archive to be archived;
determining the archive type of an archive to be archived and an archive file in the archive to be archived based on the archive image;
extracting bibliographic information from a character recognition result of the file image based on the file type;
determining directory information based on the archive file;
the archive image is archived based on the archive type, bibliographic information, and directory information.
Optionally, determining the archive type of the archive to be archived and the archive files in the archive to be archived based on the archive image comprises:
determining the archive type of the archive to be archived and the archive file in the archive to be archived based on the identification characters and/or the archive format in the archive image.
Optionally, determining the archive type of the archive to be archived and the archive file in the archive to be archived based on the identification character and the archive format in the archive image comprises:
determining whether the character recognition result of the archival image contains a preset target identification character;
if yes, determining the archive type of the archive to be archived and the archive file in the archive to be archived based on the first association relationship between the target identification character and the archive type and the second association relationship between the archive files;
if not, determining the archive type of the archive to be archived and the archive file in the archive to be archived based on the archive format of the archive image.
Optionally, determining an archive type of the archive to be archived and an archive file in the archive to be archived based on the archive format of the archive image comprises:
clustering the archive to be archived based on the archive format in the archive image;
and determining the archive type of the archive to be archived and the archive files in the archive to be archived based on the processing result of the clustering process.
Optionally, extracting bibliographic information corresponding to the file type from the character recognition result of the file image includes:
determining a bibliographic field corresponding to the file type;
the contents of the bibliographic field are extracted from the character recognition result of the file image as bibliographic information.
Optionally, before archiving the archive image based on the archive type, bibliographic information, and bibliographic information, the method further comprises:
identifying a blank image corresponding to a blank page of an archive to be archived;
and deleting the blank image in the archival image.
Optionally, the method further includes:
and identifying the archive image based on the optical character recognition OCR to obtain a character recognition result.
Optionally, recognizing the archival image based on OCR to obtain a character recognition result, including:
recognizing the archive image based on an OCR (optical character recognition) to obtain an initial character recognition result;
performing semantic analysis on the initial character recognition result to obtain a semantic analysis result;
and adjusting the initial character recognition result according to the voice analysis result to obtain a character recognition result.
Optionally, the method further includes:
when an archive inquiry request for the target archive is received, the archive image of the target archive is inquired in the archived archive images based on the bibliographic information and the directory information.
In a second aspect, an embodiment of the present application provides an archive device for an archive, including:
the image acquisition module is used for acquiring a file image of a file to be archived;
the archive type and archive file identification module is used for determining the archive type of the archive to be archived and the archive files in the archive to be archived based on the archive image;
the bibliographic information extraction module is used for extracting bibliographic information from a character recognition result of a file image based on the file type;
the directory information extraction module is used for determining directory information based on the archive file;
and the file filing module is used for filing the file image based on the file type, the bibliographic information and the directory information.
Optionally, the archive type and archive file identification module is specifically configured to:
determining the archive type of the archive to be archived and the archive file in the archive to be archived based on the identification characters and/or the archive format in the archive image.
Optionally, the archive type and archive file identification module is specifically configured to, when determining the archive type of the archive to be archived and the archive file in the archive to be archived based on the identifier character and the archive format in the archive image:
determining whether the character recognition result of the archival image contains a preset target identification character;
if yes, determining the archive type of the archive to be archived and the archive file in the archive to be archived based on the first association relationship between the target identification character and the archive type and the second association relationship between the archive files;
if not, determining the archive type of the archive to be archived and the archive file in the archive to be archived based on the archive format of the archive image.
Optionally, the archive type and archive file identification module is specifically configured to, when determining the archive type of the archive to be archived and the archive file in the archive to be archived based on the archive format of the archive image:
clustering the archive to be archived based on the archive format in the archive image;
and determining the archive type of the archive to be archived and the archive files in the archive to be archived based on the processing result of the clustering process.
Optionally, the bibliographic information extraction module is specifically configured to:
determining a bibliographic field corresponding to the file type;
the contents of the bibliographic field are extracted from the character recognition result of the file image as bibliographic information.
Optionally, the apparatus further comprises:
and the blank image deleting module is used for identifying a blank image corresponding to a blank page of the file to be archived and deleting the blank image in the file image before the file image is archived based on the file type, the bibliographic information and the directory information.
Optionally, the apparatus further comprises:
and the character recognition module is used for recognizing the archive image based on the OCR to obtain a character recognition result.
Optionally, the character recognition module is specifically configured to:
recognizing the archive image based on an OCR (optical character recognition) to obtain an initial character recognition result;
performing semantic analysis on the initial character recognition result to obtain a semantic analysis result;
and adjusting the initial character recognition result according to the voice analysis result to obtain a character recognition result.
Optionally, the apparatus further comprises:
and the archive inquiry module is used for inquiring the archive image of the target archive in the archived archive image based on the bibliographic information and the directory information when receiving an archive inquiry request for the target archive.
In a third aspect, an embodiment of the present application provides an electronic device, including: a processor and a memory;
a memory for storing operating instructions;
a processor, configured to execute an archive method of an archive as shown in any implementation manner of the first aspect of the present application by calling an operation instruction.
In a fourth aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the archive method for the archive shown in any implementation manner of the first aspect of the present application.
The technical scheme provided by the embodiment of the application has the following beneficial effects:
according to the scheme provided by the embodiment of the application, the archive type of the archive to be archived and the archive file in the archive to be archived are determined based on the archive image of the archive to be archived, the bibliographic information is extracted from the character recognition result of the archive image based on the archive type, and the directory information is determined based on the archive file, so that the archive image is archived based on the archive type, the bibliographic information and the directory information. Based on the scheme, automatic classification and automatic file recording of the files can be realized, manual operation is replaced, higher accuracy and processing efficiency are achieved, and a foundation is laid for realization of electronization of the paper files and knowledge and high-order utilization of the electronic files.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings used in the description of the embodiments of the present application will be briefly described below.
Fig. 1 is a schematic flowchart illustrating an archive filing method according to an embodiment of the present disclosure;
FIG. 2 is a schematic view illustrating a file digitization process according to an embodiment of the present application;
FIG. 3 is a schematic structural diagram of a file filing apparatus according to an embodiment of the present disclosure;
fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary only for the purpose of explaining the present application and are not to be construed as limiting the present invention.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.
To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
The general process of file bibliography and catalog classification is as follows:
(1) file arrangement: the archives worker arranges and unwinds archives matter, puts in order, puts into the scanner, prepares for the scanning.
(2) Scanning: the scanner scans the archives through the scanner, and the quality (no break angle, no deflection and the like) of the scanned archives is guaranteed.
(3) Processing a scanned picture: the system carries out preliminary treatment such as removing black edges on the scanned picture.
(4) And (3) file information supplement and entry: and recording the file metadata information by the file staff according to the actual condition.
(5) And (4) catalog classification: and the file staff additionally records the file directory information according to the actual situation.
(6) Image quality inspection: the file staff rechecks whether the file content is perfect, whether the image is clear, and deletes blank pages.
(7) Archiving: and archiving the files aiming at the files passing the quality inspection.
(8) And (3) reading files: after the archive is filed, an archive worker can read and search the archive on line through the on-line system.
In the existing process, a large amount of manpower and material resources are required to be input when a file worker arranges and records files, and the error rate is high when a catalog is identified by naked eyes.
In addition, many blank images are generated in the scanning process, and the storage space is wasted.
The archive filing method, apparatus, electronic device and computer-readable storage medium provided in the embodiments of the present application aim to solve at least one of the above technical problems in the prior art.
The following describes the technical solutions of the present application and how to solve the above technical problems with specific embodiments. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.
Fig. 1 shows a schematic flowchart of an archive filing method provided in an embodiment of the present application, and as shown in fig. 1, the method mainly includes:
step S110: acquiring an archive image of an archive to be archived;
step S120: determining the archive type of an archive to be archived and an archive file in the archive to be archived based on the archive image;
step S130: extracting bibliographic information from a character recognition result of the file image based on the file type;
step S140: determining directory information based on the archive file;
step S150: the archive image is archived based on the archive type, bibliographic information, and directory information.
In the embodiment of the application, the archives to be archived can be scanned through the scanner, and the archives images are obtained.
In the embodiment of the application, after the archive image of the archive to be archived is acquired, the archive type of the archive to be archived and the archive file in the archive to be archived can be determined based on the archive image.
The archive file may be a file of certification material, contract, etc. in the archive. After determining the archive files in the archive to be archived, directory information can be determined for each archive file.
In the embodiment of the application, the bibliographic information may include bibliographic fields such as file titles, issue dates, issue departments, manuscript imitators, reviewers and the like.
In the embodiment of the application, the character recognition result can be obtained by performing character recognition on the archival image. Because the corresponding bibliographic information of the files of each file type may not be the same, the bibliographic information can be obtained for the files of different file types respectively.
After determining the file type, the bibliographic information and the directory information of the file to be archived, the archive of the file to be archived can be realized based on the file type, the bibliographic information and the directory information. Specifically, the files to be archived can be stored in a classified manner according to the file types, and the bibliographic information and the directory information are stored in association with the files to be archived, so that a basis is provided for subsequently inquiring the files to be archived through the bibliographic information and the directory information.
Before the archive to be archived is archived, the archive type, bibliographic information and directory information can be confirmed manually, so that the accuracy of archiving is guaranteed.
The method provided by the embodiment of the application determines the file type of the file to be filed and the file files in the file to be filed based on the file image of the file to be filed, extracts the bibliographic information from the character recognition result of the file image based on the file type, and determines the directory information based on the file files, thereby filing the file image based on the file type, the bibliographic information and the directory information. Based on the scheme, automatic classification and automatic file recording of the files can be realized, manual operation is replaced, higher accuracy and processing efficiency are achieved, and a foundation is laid for realization of electronization of the paper files and knowledge and high-order utilization of the electronic files.
In an optional manner of the embodiment of the application, determining the archive type of the archive to be archived and the archive file in the archive to be archived based on the archive image includes:
determining the archive type of the archive to be archived and the archive file in the archive to be archived based on the identification characters and/or the archive format in the archive image.
In the embodiment of the application, the identifier character may be a character included in the archive image, which can represent the type of the archive to which the identifier character belongs, or a character of the archive file to which the identifier character belongs. Taking the identification character of the archive file as an example, if the title of the archive file has the word of "contract", the file can be regarded as a contract.
In the embodiment of the application, the archive format is an archive or a layout format of an archive file in the archive, and the type of the archive or the character of the archive file can be represented. Taking the archive file as an example, some formal files have the same layout format, such as the same lines, shapes, etc.
In the embodiment of the application, the archive type of the archive to be archived and the archive file in the archive to be archived can be determined based on at least one of the identification character and the archive format in the archive image.
In an optional mode of this application embodiment, based on the identification character and the archive format in the archive image, determine the archive type of the archive to be archived and the archive file in the archive to be archived, include:
determining whether the character recognition result of the archival image contains a preset target identification character;
if yes, determining the archive type of the archive to be archived and the archive file in the archive to be archived based on the first association relationship between the target identification character and the archive type and the second association relationship between the archive files;
if not, determining the archive type of the archive to be archived and the archive file in the archive to be archived based on the archive format of the archive image.
In the embodiment of the application, the type of the archive to be archived and the archive file in the archive to be archived are determined together based on the identification characters and the archive format in the archive image. Whether the character recognition result of the archive image contains preset target identification characters or not can be determined, and the target characters can represent the type of the archive to be archived and the archive files in the archive to be archived.
Specifically, a first association relationship between the target identification character and the archive type may be preset, and a second association relationship between the target identification character and the archive file may be preset, so that when the target character is determined to be present in the archive image, the archive type of the archive to be archived and the archive file in the archive to be archived are determined.
If the target identification character cannot be found in the character recognition result of the archive image, the archive type of the archive to be archived and the archive file in the archive to be archived can be determined based on the archive format of the archive image.
In an optional mode of the embodiment of the application, determining an archive type of an archive to be archived and an archive file in the archive to be archived based on an archive format of an archive image includes:
clustering the archive to be archived based on the archive format in the archive image;
and determining the archive type of the archive to be archived and the archive files in the archive to be archived based on the processing result of the clustering process.
In this application embodiment, can extract the archives version in the archives image to carry out cluster processing based on the archives version, specifically speaking, can treat to file archives based on the archives version and carry out cluster processing, also can treat to file the archives in the archives based on the archives version and carry out cluster processing. The archives to be archived belonging to the same archive type can be clustered into the same cluster, and the archives belonging to the same archive file can be clustered into the same cluster. And determining the archive type of the archive to be archived and the archive files specifically contained in the archive to be archived according to the clustering result.
In an optional mode of the embodiment of the present application, extracting bibliographic information from a character recognition result of an archive image based on an archive type includes:
determining a bibliographic field corresponding to the file type;
the contents of the bibliographic field are extracted from the character recognition result of the file image as bibliographic information.
In the embodiment of the present application, the bibliographic fields in the files corresponding to each file type may be different, so that the corresponding relationship between the file type and the bibliographic fields can be preset, and the bibliographic fields can be extracted for the files of different file types.
In an optional manner of this embodiment of the application, before filing the archive image based on the archive type, the bibliographic information, and the directory information, the method further includes:
identifying a blank image corresponding to a blank page of an archive to be archived;
and deleting the blank image in the archival image.
In the embodiment of the present application, if a blank page possibly existing in the archive to be archived is scanned to a blank back image of the archive page, the blank image is archived together, which wastes storage space. Blank images in the archival image can be identified and deleted from the archival image to reduce the use of storage space.
In an optional manner of the embodiment of the present application, the method further includes:
based on OCR (Optical Character Recognition), the archive image is recognized, and a Character Recognition result is obtained.
In the embodiment of the present application, a GPU (General Processing Unit) server may be deployed, and the archive image is identified by using an OCR technology to obtain a character identification result.
In an optional mode of the embodiment of the application, the file image is recognized based on OCR to obtain a character recognition result, including:
recognizing the archive image based on an OCR (optical character recognition) to obtain an initial character recognition result;
performing semantic analysis on the initial character recognition result to obtain a semantic analysis result;
and adjusting the initial character recognition result according to the voice analysis result to obtain a character recognition result.
In the embodiment of the application, the file image can be recognized based on the OCR technology to obtain the initial character recognition result, and then the initial character recognition result is analyzed through the semantic analysis model to obtain the semantic analysis result. And adjusting the initial character recognition result through the voice analysis result to obtain a character recognition result.
By carrying out semantic analysis on the character recognition result, the accuracy of the character recognition result can be improved, and a foundation is provided for improving the accuracy of the extraction of the bibliographic information.
In an optional manner of the embodiment of the present application, the method further includes:
when an archive inquiry request for the target archive is received, the archive image of the target archive is inquired in the archived archive images based on the bibliographic information and the directory information.
In the embodiment of the application, after the archive is filed, a query interface can be provided. The user can initiate a file query request through the terminal device, and the file query request can carry query information, such as a file name, a name of a file, and the like. Based on the bibliographic information and the directory information, the files can be quickly determined, and quick inquiry of the files is facilitated.
Fig. 2 is a schematic diagram illustrating a file digitization process according to an embodiment of the present application, and as shown in fig. 2, an administrator performs file type configuration, category bibliographic field configuration, and file directory configuration, that is, configures identification rules of file type, bibliographic information, and directory information, so as to achieve identification of file type, bibliographic information, and directory information.
The scanning personnel arranges the archives to be archived and scans the archives into archives images.
The ICR identification module processes the scanned image, and performs file type identification, bibliographic field identification, directory identification influence and blank page deletion operations, namely, the file type, bibliographic information and directory information are identified through the file image, and the blank page in the file image is deleted.
And the supplementary recording personnel manually checks the identification result, and if the verification fails, the identification process needs to be carried out again. And if the verification is passed, performing archive archiving on the archive to be archived.
The archives utilize personnel can carry out archives inquiry to the archives management system as required, retrieve the archives image.
Based on the same principle as the method shown in fig. 1, fig. 3 shows a schematic structural diagram of an archive device for an archive provided by an embodiment of the present application, and as shown in fig. 3, the archive device 20 for an archive may include:
an image obtaining module 210, configured to obtain an archive image of an archive to be archived;
an archive type and archive file identification module 220 for determining an archive type of an archive to be archived and an archive file in the archive to be archived based on the archive image;
a bibliographic information extraction module 230, configured to extract bibliographic information from a character recognition result of the file image based on the file type;
a directory information extraction module 240 for determining directory information based on the archive file;
an archive filing module 250 for filing the archive image based on the archive type, bibliographic information, and directory information.
The device provided by the embodiment of the application determines the file type of the file to be filed and the file files in the file to be filed based on the file image of the file to be filed, extracts the bibliographic information from the character recognition result of the file image based on the file type, and determines the directory information based on the file files, thereby filing the file image based on the file type, the bibliographic information and the directory information. Based on the scheme, automatic classification and automatic file recording of the files can be realized, manual operation is replaced, higher accuracy and processing efficiency are achieved, and a foundation is laid for realization of electronization of the paper files and knowledge and high-order utilization of the electronic files.
Optionally, the archive type and archive file identification module is specifically configured to:
determining the archive type of the archive to be archived and the archive file in the archive to be archived based on the identification characters and/or the archive format in the archive image.
Optionally, the archive type and archive file identification module is specifically configured to, when determining the archive type of the archive to be archived and the archive file in the archive to be archived based on the identifier character and the archive format in the archive image:
determining whether the character recognition result of the archival image contains a preset target identification character;
if yes, determining the archive type of the archive to be archived and the archive file in the archive to be archived based on the first association relationship between the target identification character and the archive type and the second association relationship between the archive files;
if not, determining the archive type of the archive to be archived and the archive file in the archive to be archived based on the archive format of the archive image.
Optionally, the archive type and archive file identification module is specifically configured to, when determining the archive type of the archive to be archived and the archive file in the archive to be archived based on the archive format of the archive image:
clustering the archive to be archived based on the archive format in the archive image;
and determining the archive type of the archive to be archived and the archive files in the archive to be archived based on the processing result of the clustering process.
Optionally, the bibliographic information extraction module is specifically configured to:
determining a bibliographic field corresponding to the file type;
the contents of the bibliographic field are extracted from the character recognition result of the file image as bibliographic information.
Optionally, the apparatus further comprises:
and the blank image deleting module is used for identifying a blank image corresponding to a blank page of the file to be archived and deleting the blank image in the file image before the file image is archived based on the file type, the bibliographic information and the directory information.
Optionally, the apparatus further comprises:
and the character recognition module is used for recognizing the archive image based on the OCR to obtain a character recognition result.
Optionally, the character recognition module is specifically configured to:
recognizing the archive image based on an OCR (optical character recognition) to obtain an initial character recognition result;
performing semantic analysis on the initial character recognition result to obtain a semantic analysis result;
and adjusting the initial character recognition result according to the voice analysis result to obtain a character recognition result.
Optionally, the apparatus further comprises:
and the archive inquiry module is used for inquiring the archive image of the target archive in the archived archive image based on the bibliographic information and the directory information when receiving an archive inquiry request for the target archive.
It is understood that the above modules of the archive device in the present embodiment have functions of implementing the corresponding steps of the archive method in the embodiment shown in fig. 1. The function can be realized by hardware, and can also be realized by executing corresponding software by hardware. The hardware or software includes one or more modules corresponding to the functions described above. The modules can be software and/or hardware, and each module can be implemented independently or by integrating a plurality of modules. For the functional description of each module of the archive filing apparatus, reference may be specifically made to the corresponding description of the archive filing method in the embodiment shown in fig. 1, and details are not described here again.
The embodiment of the application provides an electronic device, which comprises a processor and a memory;
a memory for storing operating instructions;
and the processor is used for executing the archive filing method of the archive provided by any embodiment of the application by calling the operation instruction.
As an example, fig. 4 shows a schematic structural diagram of an electronic device to which an embodiment of the present application is applicable, and as shown in fig. 4, the electronic device 2000 includes: a processor 2001 and a memory 2003. Wherein the processor 2001 is coupled to a memory 2003, such as via a bus 2002. Optionally, the electronic device 2000 may also include a transceiver 2004. It should be noted that the transceiver 2004 is not limited to one in practical applications, and the structure of the electronic device 2000 is not limited to the embodiment of the present application.
The processor 2001 is applied to the embodiment of the present application to implement the method shown in the above method embodiment. The transceiver 2004 may include a receiver and a transmitter, and the transceiver 2004 is applied to the embodiments of the present application to implement the functions of the electronic device of the embodiments of the present application to communicate with other devices when executed.
The Processor 2001 may be a CPU (Central Processing Unit), general Processor, DSP (Digital Signal Processor), ASIC (Application Specific Integrated Circuit), FPGA (Field Programmable Gate Array) or other Programmable logic device, transistor logic device, hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure. The processor 2001 may also be a combination of computing functions, e.g., comprising one or more microprocessors, DSPs and microprocessors, and the like.
Bus 2002 may include a path that conveys information between the aforementioned components. The bus 2002 may be a PCI (Peripheral Component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The bus 2002 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 4, but this does not indicate only one bus or one type of bus.
The Memory 2003 may be a ROM (Read Only Memory) or other type of static storage device that can store static information and instructions, a RAM (Random Access Memory) or other type of dynamic storage device that can store information and instructions, an EEPROM (Electrically Erasable Programmable Read Only Memory), a CD-ROM (Compact Disc Read Only Memory) or other optical Disc storage, optical Disc storage (including Compact Disc, laser Disc, optical Disc, digital versatile Disc, blu-ray Disc, etc.), a magnetic disk storage medium or other magnetic storage device, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to these.
Optionally, the memory 2003 is used for storing application program code for performing the disclosed aspects, and is controlled in execution by the processor 2001. The processor 2001 is configured to execute the application program code stored in the memory 2003 to implement the archive method of the archive provided in any of the embodiments of the present application.
The electronic device provided by the embodiment of the application is applicable to any embodiment of the method, and is not described herein again.
Compared with the prior art, the electronic equipment determines the file type of a file to be filed and the file files in the file to be filed based on the file images of the file to be filed, extracts the bibliographic information from the character recognition result of the file images based on the file type, and determines the directory information based on the file files, so that the file images are filed based on the file type, the bibliographic information and the directory information. Based on the scheme, automatic classification and automatic file recording of the files can be realized, manual operation is replaced, higher accuracy and processing efficiency are achieved, and a foundation is laid for realization of electronization of the paper files and knowledge and high-order utilization of the electronic files.
The embodiment of the present application provides a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, the computer program implements the archive filing method shown in the above method embodiment.
The computer-readable storage medium provided in the embodiments of the present application is applicable to any of the embodiments of the foregoing method, and is not described herein again.
The embodiment of the present application provides a computer-readable storage medium, which, in comparison with the prior art, determines an archive type of an archive to be archived and an archive file in the archive to be archived based on an archive image of the archive to be archived, extracts bibliographic information from a character recognition result of the archive image based on the archive type, and determines directory information based on the archive file, thereby archiving the archive image based on the archive type, the bibliographic information, and the directory information. Based on the scheme, automatic classification and automatic file recording of the files can be realized, manual operation is replaced, higher accuracy and processing efficiency are achieved, and a foundation is laid for realization of electronization of the paper files and knowledge and high-order utilization of the electronic files.
It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
The foregoing is only a partial embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (12)

1. An archive method of an archive, comprising:
acquiring an archive image of an archive to be archived;
determining an archive type of the archive to be archived and an archive file in the archive to be archived based on the archive image;
extracting bibliographic information from a character recognition result of the archive image based on the archive type;
determining directory information based on the archive file;
archiving the archive image based on the archive type, the bibliographic information, and the catalog information.
2. The method of claim 1, wherein determining an archive type of the archive to be archived and archive files in the archive to be archived based on the archive image comprises:
and determining the archive type of the archive to be archived and the archive file in the archive to be archived based on the identification characters and/or the archive format in the archive image.
3. The method of claim 2, wherein determining the archive type of the archive to be archived and the archive file in the archive to be archived based on the identification characters and the archive format in the archive image comprises:
determining whether the character recognition result of the archival image contains a preset target identification character;
if yes, determining the archive type of the archive to be archived and the archive file in the archive to be archived based on the first association relationship between the target identification character and the archive type and the second association relationship between the archive files;
if not, determining the archive type of the archive to be archived and the archive file in the archive to be archived based on the archive format of the archive image.
4. The method of claim 2 or 3, wherein determining the archive type of the archive to be archived and the archive files in the archive to be archived based on the archive format of the archive image comprises:
clustering the archives to be archived based on the archive formats in the archive images;
determining the archive type of the archive to be archived and the archive files in the archive to be archived based on the processing result of the clustering process.
5. The method of claim 1, wherein extracting bibliographic information from the character recognition result of the archival image based on the archival type comprises:
determining a bibliographic field corresponding to the file type;
and extracting the content of the bibliographic field from the character recognition result of the archive image as bibliographic information.
6. The method of any one of claims 1, 2, 3, and 5, wherein prior to the archiving the archival image based on the archival type, the bibliographic information, and the catalog information, the method further comprises:
identifying a blank image corresponding to a blank page of the file to be archived;
and deleting the blank image in the archival image.
7. The method of any one of claims 1, 2, 3, and 5, further comprising:
and identifying the archive image based on Optical Character Recognition (OCR) to obtain the character recognition result.
8. The method of claim 7, wherein recognizing the archival image based on OCR to obtain the character recognition result comprises:
recognizing the archive image based on OCR to obtain an initial character recognition result;
performing semantic analysis on the initial character recognition result to obtain a semantic analysis result;
and adjusting the initial character recognition result according to the voice analysis result to obtain the character recognition result.
9. The method of any one of claims 1, 2, 3, and 5, further comprising:
and when an archive inquiry request for a target archive is received, inquiring an archive image of the target archive in the archived archive images based on the bibliographic information and the directory information.
10. An archive device for files, comprising:
the image acquisition module is used for acquiring a file image of a file to be archived;
the archive type and archive file identification module is used for determining the archive type of the archive to be archived and the archive files in the archive to be archived based on the archive image;
the bibliographic information extraction module is used for extracting bibliographic information from the character recognition result of the file image based on the file type;
the directory information extraction module is used for determining directory information based on the archive file;
and the archive filing module is used for filing the archive image based on the archive type, the bibliographic information and the directory information.
11. An electronic device comprising a processor and a memory;
the memory is used for storing operation instructions;
the processor is used for executing the method of any one of claims 1-9 by calling the operation instruction.
12. A computer-readable storage medium, characterized in that the storage medium has stored thereon a computer program which, when being executed by a processor, carries out the method of any one of claims 1-9.
CN202010840577.9A 2020-08-20 2020-08-20 Archive filing method and device, electronic equipment and computer readable storage medium Pending CN112052749A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010840577.9A CN112052749A (en) 2020-08-20 2020-08-20 Archive filing method and device, electronic equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010840577.9A CN112052749A (en) 2020-08-20 2020-08-20 Archive filing method and device, electronic equipment and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN112052749A true CN112052749A (en) 2020-12-08

Family

ID=73599633

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010840577.9A Pending CN112052749A (en) 2020-08-20 2020-08-20 Archive filing method and device, electronic equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN112052749A (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112632009A (en) * 2020-12-29 2021-04-09 航天信息股份有限公司 Electronic file processing method and device, storage medium and electronic equipment
CN112733658A (en) * 2020-12-31 2021-04-30 北京华宇信息技术有限公司 Electronic document filing method and device
CN112800949A (en) * 2021-01-27 2021-05-14 刘培育 Artificial intelligence-based paper archive digital processing method, system and equipment
CN113256241A (en) * 2021-04-24 2021-08-13 南京樯图数据研究院有限公司 Artificial intelligence platform for industrial data archive management
CN113344510A (en) * 2021-05-28 2021-09-03 方欣科技有限公司 Intelligent tax material online auditing method, device, terminal and storage medium
CN113377902A (en) * 2021-05-28 2021-09-10 南方电网数字电网研究院有限公司 Digital archive recording configuration method, system, device and storage medium
CN113742287A (en) * 2021-08-31 2021-12-03 远光软件股份有限公司 Archive data archiving method based on data middlebox, computer device and computer readable storage medium
CN114067961A (en) * 2021-10-27 2022-02-18 武汉联影医疗科技有限公司 Medical image filing system, method and storage medium
CN114117095A (en) * 2022-01-25 2022-03-01 广东图友软件科技有限公司 Audio-video archive recording method and device based on image recognition
CN114298238A (en) * 2021-12-31 2022-04-08 瀚云科技有限公司 File creation method and device, electronic equipment and storage medium
CN114299527A (en) * 2021-11-04 2022-04-08 烟台大学 Data processing method and device for paper document
CN115185888A (en) * 2022-07-27 2022-10-14 海南绿境高科环保有限公司 Enterprise environment-friendly archive management method, device, equipment and storage medium
CN116126790A (en) * 2023-04-17 2023-05-16 百盛联合杭温铁路有限公司 Railway engineering archive archiving method and device, electronic equipment and storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104217290A (en) * 2014-09-01 2014-12-17 南通北城科技创业管理有限公司 An archive management system
WO2018126742A1 (en) * 2017-01-05 2018-07-12 福建亿榕信息技术有限公司 Electronic batch processing method and system for stored archives, and storage medium
CN109598228A (en) * 2018-11-30 2019-04-09 泰华智慧产业集团股份有限公司 Paper document electronization is recorded to the method and system of filing
CN109783452A (en) * 2018-12-29 2019-05-21 福建华闽通达信息技术有限公司 A kind of the construction project file collection archiving method and system of rule-basedization
CN109918471A (en) * 2019-01-28 2019-06-21 平安科技(深圳)有限公司 Archive method, apparatus, computer equipment and storage medium
CN110245112A (en) * 2019-06-21 2019-09-17 同略科技有限公司 Intelligent archive management method, system, terminal and storage medium based on AI
CN110705515A (en) * 2019-10-18 2020-01-17 山东健康医疗大数据有限公司 Hospital paper archive filing method and system based on OCR character recognition
CN110852699A (en) * 2019-10-10 2020-02-28 暨南大学 Electronic intelligent management system and method for files

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104217290A (en) * 2014-09-01 2014-12-17 南通北城科技创业管理有限公司 An archive management system
WO2018126742A1 (en) * 2017-01-05 2018-07-12 福建亿榕信息技术有限公司 Electronic batch processing method and system for stored archives, and storage medium
CN109598228A (en) * 2018-11-30 2019-04-09 泰华智慧产业集团股份有限公司 Paper document electronization is recorded to the method and system of filing
CN109783452A (en) * 2018-12-29 2019-05-21 福建华闽通达信息技术有限公司 A kind of the construction project file collection archiving method and system of rule-basedization
CN109918471A (en) * 2019-01-28 2019-06-21 平安科技(深圳)有限公司 Archive method, apparatus, computer equipment and storage medium
CN110245112A (en) * 2019-06-21 2019-09-17 同略科技有限公司 Intelligent archive management method, system, terminal and storage medium based on AI
CN110852699A (en) * 2019-10-10 2020-02-28 暨南大学 Electronic intelligent management system and method for files
CN110705515A (en) * 2019-10-18 2020-01-17 山东健康医疗大数据有限公司 Hospital paper archive filing method and system based on OCR character recognition

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
蒋卉;: "归档文件目录著录质量分析和业务操作", 档案与建设, no. 11, 15 November 2013 (2013-11-15) *
闫丽侠;: "使用"南大之星"软件著录归档文件的实践", 长春师范学院学报, no. 09, 20 September 2012 (2012-09-20) *

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112632009A (en) * 2020-12-29 2021-04-09 航天信息股份有限公司 Electronic file processing method and device, storage medium and electronic equipment
CN112733658A (en) * 2020-12-31 2021-04-30 北京华宇信息技术有限公司 Electronic document filing method and device
CN112800949A (en) * 2021-01-27 2021-05-14 刘培育 Artificial intelligence-based paper archive digital processing method, system and equipment
CN113256241A (en) * 2021-04-24 2021-08-13 南京樯图数据研究院有限公司 Artificial intelligence platform for industrial data archive management
CN113344510A (en) * 2021-05-28 2021-09-03 方欣科技有限公司 Intelligent tax material online auditing method, device, terminal and storage medium
CN113377902A (en) * 2021-05-28 2021-09-10 南方电网数字电网研究院有限公司 Digital archive recording configuration method, system, device and storage medium
CN113742287A (en) * 2021-08-31 2021-12-03 远光软件股份有限公司 Archive data archiving method based on data middlebox, computer device and computer readable storage medium
CN113742287B (en) * 2021-08-31 2024-09-03 远光软件股份有限公司 Archive data archiving method based on data center, computer device and computer readable storage medium
CN114067961A (en) * 2021-10-27 2022-02-18 武汉联影医疗科技有限公司 Medical image filing system, method and storage medium
CN114299527A (en) * 2021-11-04 2022-04-08 烟台大学 Data processing method and device for paper document
CN114298238A (en) * 2021-12-31 2022-04-08 瀚云科技有限公司 File creation method and device, electronic equipment and storage medium
CN114117095A (en) * 2022-01-25 2022-03-01 广东图友软件科技有限公司 Audio-video archive recording method and device based on image recognition
CN115185888A (en) * 2022-07-27 2022-10-14 海南绿境高科环保有限公司 Enterprise environment-friendly archive management method, device, equipment and storage medium
CN116126790A (en) * 2023-04-17 2023-05-16 百盛联合杭温铁路有限公司 Railway engineering archive archiving method and device, electronic equipment and storage medium
CN116126790B (en) * 2023-04-17 2023-07-11 百盛联合杭温铁路有限公司 Railway engineering archive archiving method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN112052749A (en) Archive filing method and device, electronic equipment and computer readable storage medium
US9002838B2 (en) Distributed capture system for use with a legacy enterprise content management system
US20160055376A1 (en) Method and system for identification and extraction of data from structured documents
US9390089B2 (en) Distributed capture system for use with a legacy enterprise content management system
US20050289182A1 (en) Document management system with enhanced intelligent document recognition capabilities
CN107291949B (en) Information searching method and device
CN108509542A (en) A kind of quick filing system of archives and its archiving method
CN110688349A (en) Document sorting method, device, terminal and computer readable storage medium
CN112560411A (en) Intelligent personnel information input method and system
CN108304815A (en) A kind of data capture method, device, server and storage medium
CN111898433A (en) Paper bill digitization method and device
CN115116068A (en) Archive intelligent filing system based on OCR
JP6786658B2 (en) Document reading system
CN111859885A (en) Automatic generation method and system for legal decision book
JP6127597B2 (en) Information processing apparatus, control method thereof, and program
WO2024012209A1 (en) Image recognition-based service processing method and apparatus, and storage medium
US7532368B2 (en) Automated processing of paper forms using remotely-stored form content
CN112364790B (en) Airport work order information identification method and system based on convolutional neural network
CN114495138A (en) Intelligent document identification and feature extraction method, device platform and storage medium
CN113742286A (en) Archive data filing layout file generation method, computer device and computer readable storage medium
JP4480109B2 (en) Image management apparatus and image management method
CN111046864A (en) Method and system for automatically extracting five elements of contract scanning piece
JP2003316802A (en) Image management system, image management method and image management program
CN112506873B (en) Automatic data entry management system for physical archives
CN116719839B (en) Data query method and device of accounting file and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20220915

Address after: 25 Financial Street, Xicheng District, Beijing 100033

Applicant after: CHINA CONSTRUCTION BANK Corp.

Address before: 25 Financial Street, Xicheng District, Beijing 100033

Applicant before: CHINA CONSTRUCTION BANK Corp.

Applicant before: Jianxin Financial Science and Technology Co.,Ltd.