CN112052749A - Archive filing method and device, electronic equipment and computer readable storage medium - Google Patents
Archive filing method and device, electronic equipment and computer readable storage medium Download PDFInfo
- Publication number
- CN112052749A CN112052749A CN202010840577.9A CN202010840577A CN112052749A CN 112052749 A CN112052749 A CN 112052749A CN 202010840577 A CN202010840577 A CN 202010840577A CN 112052749 A CN112052749 A CN 112052749A
- Authority
- CN
- China
- Prior art keywords
- archive
- archived
- file
- image
- type
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 56
- 238000012545 processing Methods 0.000 claims abstract description 19
- 238000012015 optical character recognition Methods 0.000 claims description 22
- 230000008569 process Effects 0.000 claims description 12
- 238000000605 extraction Methods 0.000 claims description 9
- 238000004590 computer program Methods 0.000 claims description 5
- 230000006870 function Effects 0.000 description 6
- 238000010586 diagram Methods 0.000 description 5
- 239000000284 extract Substances 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 3
- 238000007689 inspection Methods 0.000 description 3
- 239000000463 material Substances 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 238000005034 decoration Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000011022 operating instruction Methods 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 238000012795 verification Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 239000000969 carrier Substances 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 235000019800 disodium phosphate Nutrition 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000008570 general process Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/51—Indexing; Data structures therefor; Storage structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/58—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/583—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
- G06F16/5846—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using extracted text
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/906—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/93—Document management systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
- G06V30/41—Analysis of document content
- G06V30/416—Extracting the logical structure, e.g. chapters, sections or page numbers; Identifying elements of the document, e.g. authors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Multimedia (AREA)
- Life Sciences & Earth Sciences (AREA)
- Library & Information Science (AREA)
- Software Systems (AREA)
- Business, Economics & Management (AREA)
- General Business, Economics & Management (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The embodiment of the application provides a method and a device for archiving files, electronic equipment and a computer readable storage medium. The method comprises the following steps: acquiring an archive image of an archive to be archived; determining the archive type of an archive to be archived and an archive file in the archive to be archived based on the archive image; extracting bibliographic information from a character recognition result of the file image based on the file type; determining directory information based on the archive file; the archive image is archived based on the archive type, bibliographic information, and directory information. Based on the scheme, automatic classification and automatic file recording of the files can be realized, manual operation is replaced, higher accuracy and processing efficiency are achieved, and a foundation is laid for realization of electronization of the paper files and knowledge and high-order utilization of the electronic files.
Description
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method and an apparatus for archiving an archive, an electronic device, and a computer-readable storage medium.
Background
Financial archives contain all kinds of archives such as document, accounting archives, acoustic image archives, material object archives, credit card archives, credit archives, capital construction archives, in order to realize the digital management of archives, the archives administrator is at the arrangement in-process of archives, involves a large amount of books and catalogue classification work, and present archives digital flow generally contains: sorting, uncoiling, scanning, image processing, data additional recording, data checking, binding reduction, digital result quality inspection, arranging file numbering and cataloguing, and finally carrying out data archiving. At present, although the electronization of paper archives can be realized through an image scanning system, the archives are complicated and changeable in attributes and large in data volume, so that the recording and classification of the archives still need a large amount of manual operation, and errors are easy to occur.
With the rapid development of financial services, the data volume of archives is larger and larger, the archives are used as carriers of value data, the effect of the archives in each service is more and more obvious, and the utilization requirements of each large financial institution on the archives are more and more vigorous. The digitization of archives is made in order to put in each big financial institution the urgent difficult problem that awaits solution, and the urgent needs of archives worker improve the automation processing level of archives record, reduce the work load of manual work arrangement and record, realize paper archives electronization, establish the basis for the knowledge and the high-order of electronic archives and utilize.
Disclosure of Invention
The present application aims to solve at least one of the above technical drawbacks. The technical scheme adopted by the application is as follows:
in a first aspect, an embodiment of the present application provides an archive method for an archive, where the method includes:
acquiring an archive image of an archive to be archived;
determining the archive type of an archive to be archived and an archive file in the archive to be archived based on the archive image;
extracting bibliographic information from a character recognition result of the file image based on the file type;
determining directory information based on the archive file;
the archive image is archived based on the archive type, bibliographic information, and directory information.
Optionally, determining the archive type of the archive to be archived and the archive files in the archive to be archived based on the archive image comprises:
determining the archive type of the archive to be archived and the archive file in the archive to be archived based on the identification characters and/or the archive format in the archive image.
Optionally, determining the archive type of the archive to be archived and the archive file in the archive to be archived based on the identification character and the archive format in the archive image comprises:
determining whether the character recognition result of the archival image contains a preset target identification character;
if yes, determining the archive type of the archive to be archived and the archive file in the archive to be archived based on the first association relationship between the target identification character and the archive type and the second association relationship between the archive files;
if not, determining the archive type of the archive to be archived and the archive file in the archive to be archived based on the archive format of the archive image.
Optionally, determining an archive type of the archive to be archived and an archive file in the archive to be archived based on the archive format of the archive image comprises:
clustering the archive to be archived based on the archive format in the archive image;
and determining the archive type of the archive to be archived and the archive files in the archive to be archived based on the processing result of the clustering process.
Optionally, extracting bibliographic information corresponding to the file type from the character recognition result of the file image includes:
determining a bibliographic field corresponding to the file type;
the contents of the bibliographic field are extracted from the character recognition result of the file image as bibliographic information.
Optionally, before archiving the archive image based on the archive type, bibliographic information, and bibliographic information, the method further comprises:
identifying a blank image corresponding to a blank page of an archive to be archived;
and deleting the blank image in the archival image.
Optionally, the method further includes:
and identifying the archive image based on the optical character recognition OCR to obtain a character recognition result.
Optionally, recognizing the archival image based on OCR to obtain a character recognition result, including:
recognizing the archive image based on an OCR (optical character recognition) to obtain an initial character recognition result;
performing semantic analysis on the initial character recognition result to obtain a semantic analysis result;
and adjusting the initial character recognition result according to the voice analysis result to obtain a character recognition result.
Optionally, the method further includes:
when an archive inquiry request for the target archive is received, the archive image of the target archive is inquired in the archived archive images based on the bibliographic information and the directory information.
In a second aspect, an embodiment of the present application provides an archive device for an archive, including:
the image acquisition module is used for acquiring a file image of a file to be archived;
the archive type and archive file identification module is used for determining the archive type of the archive to be archived and the archive files in the archive to be archived based on the archive image;
the bibliographic information extraction module is used for extracting bibliographic information from a character recognition result of a file image based on the file type;
the directory information extraction module is used for determining directory information based on the archive file;
and the file filing module is used for filing the file image based on the file type, the bibliographic information and the directory information.
Optionally, the archive type and archive file identification module is specifically configured to:
determining the archive type of the archive to be archived and the archive file in the archive to be archived based on the identification characters and/or the archive format in the archive image.
Optionally, the archive type and archive file identification module is specifically configured to, when determining the archive type of the archive to be archived and the archive file in the archive to be archived based on the identifier character and the archive format in the archive image:
determining whether the character recognition result of the archival image contains a preset target identification character;
if yes, determining the archive type of the archive to be archived and the archive file in the archive to be archived based on the first association relationship between the target identification character and the archive type and the second association relationship between the archive files;
if not, determining the archive type of the archive to be archived and the archive file in the archive to be archived based on the archive format of the archive image.
Optionally, the archive type and archive file identification module is specifically configured to, when determining the archive type of the archive to be archived and the archive file in the archive to be archived based on the archive format of the archive image:
clustering the archive to be archived based on the archive format in the archive image;
and determining the archive type of the archive to be archived and the archive files in the archive to be archived based on the processing result of the clustering process.
Optionally, the bibliographic information extraction module is specifically configured to:
determining a bibliographic field corresponding to the file type;
the contents of the bibliographic field are extracted from the character recognition result of the file image as bibliographic information.
Optionally, the apparatus further comprises:
and the blank image deleting module is used for identifying a blank image corresponding to a blank page of the file to be archived and deleting the blank image in the file image before the file image is archived based on the file type, the bibliographic information and the directory information.
Optionally, the apparatus further comprises:
and the character recognition module is used for recognizing the archive image based on the OCR to obtain a character recognition result.
Optionally, the character recognition module is specifically configured to:
recognizing the archive image based on an OCR (optical character recognition) to obtain an initial character recognition result;
performing semantic analysis on the initial character recognition result to obtain a semantic analysis result;
and adjusting the initial character recognition result according to the voice analysis result to obtain a character recognition result.
Optionally, the apparatus further comprises:
and the archive inquiry module is used for inquiring the archive image of the target archive in the archived archive image based on the bibliographic information and the directory information when receiving an archive inquiry request for the target archive.
In a third aspect, an embodiment of the present application provides an electronic device, including: a processor and a memory;
a memory for storing operating instructions;
a processor, configured to execute an archive method of an archive as shown in any implementation manner of the first aspect of the present application by calling an operation instruction.
In a fourth aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the archive method for the archive shown in any implementation manner of the first aspect of the present application.
The technical scheme provided by the embodiment of the application has the following beneficial effects:
according to the scheme provided by the embodiment of the application, the archive type of the archive to be archived and the archive file in the archive to be archived are determined based on the archive image of the archive to be archived, the bibliographic information is extracted from the character recognition result of the archive image based on the archive type, and the directory information is determined based on the archive file, so that the archive image is archived based on the archive type, the bibliographic information and the directory information. Based on the scheme, automatic classification and automatic file recording of the files can be realized, manual operation is replaced, higher accuracy and processing efficiency are achieved, and a foundation is laid for realization of electronization of the paper files and knowledge and high-order utilization of the electronic files.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings used in the description of the embodiments of the present application will be briefly described below.
Fig. 1 is a schematic flowchart illustrating an archive filing method according to an embodiment of the present disclosure;
FIG. 2 is a schematic view illustrating a file digitization process according to an embodiment of the present application;
FIG. 3 is a schematic structural diagram of a file filing apparatus according to an embodiment of the present disclosure;
fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary only for the purpose of explaining the present application and are not to be construed as limiting the present invention.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.
To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
The general process of file bibliography and catalog classification is as follows:
(1) file arrangement: the archives worker arranges and unwinds archives matter, puts in order, puts into the scanner, prepares for the scanning.
(2) Scanning: the scanner scans the archives through the scanner, and the quality (no break angle, no deflection and the like) of the scanned archives is guaranteed.
(3) Processing a scanned picture: the system carries out preliminary treatment such as removing black edges on the scanned picture.
(4) And (3) file information supplement and entry: and recording the file metadata information by the file staff according to the actual condition.
(5) And (4) catalog classification: and the file staff additionally records the file directory information according to the actual situation.
(6) Image quality inspection: the file staff rechecks whether the file content is perfect, whether the image is clear, and deletes blank pages.
(7) Archiving: and archiving the files aiming at the files passing the quality inspection.
(8) And (3) reading files: after the archive is filed, an archive worker can read and search the archive on line through the on-line system.
In the existing process, a large amount of manpower and material resources are required to be input when a file worker arranges and records files, and the error rate is high when a catalog is identified by naked eyes.
In addition, many blank images are generated in the scanning process, and the storage space is wasted.
The archive filing method, apparatus, electronic device and computer-readable storage medium provided in the embodiments of the present application aim to solve at least one of the above technical problems in the prior art.
The following describes the technical solutions of the present application and how to solve the above technical problems with specific embodiments. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.
Fig. 1 shows a schematic flowchart of an archive filing method provided in an embodiment of the present application, and as shown in fig. 1, the method mainly includes:
step S110: acquiring an archive image of an archive to be archived;
step S120: determining the archive type of an archive to be archived and an archive file in the archive to be archived based on the archive image;
step S130: extracting bibliographic information from a character recognition result of the file image based on the file type;
step S140: determining directory information based on the archive file;
step S150: the archive image is archived based on the archive type, bibliographic information, and directory information.
In the embodiment of the application, the archives to be archived can be scanned through the scanner, and the archives images are obtained.
In the embodiment of the application, after the archive image of the archive to be archived is acquired, the archive type of the archive to be archived and the archive file in the archive to be archived can be determined based on the archive image.
The archive file may be a file of certification material, contract, etc. in the archive. After determining the archive files in the archive to be archived, directory information can be determined for each archive file.
In the embodiment of the application, the bibliographic information may include bibliographic fields such as file titles, issue dates, issue departments, manuscript imitators, reviewers and the like.
In the embodiment of the application, the character recognition result can be obtained by performing character recognition on the archival image. Because the corresponding bibliographic information of the files of each file type may not be the same, the bibliographic information can be obtained for the files of different file types respectively.
After determining the file type, the bibliographic information and the directory information of the file to be archived, the archive of the file to be archived can be realized based on the file type, the bibliographic information and the directory information. Specifically, the files to be archived can be stored in a classified manner according to the file types, and the bibliographic information and the directory information are stored in association with the files to be archived, so that a basis is provided for subsequently inquiring the files to be archived through the bibliographic information and the directory information.
Before the archive to be archived is archived, the archive type, bibliographic information and directory information can be confirmed manually, so that the accuracy of archiving is guaranteed.
The method provided by the embodiment of the application determines the file type of the file to be filed and the file files in the file to be filed based on the file image of the file to be filed, extracts the bibliographic information from the character recognition result of the file image based on the file type, and determines the directory information based on the file files, thereby filing the file image based on the file type, the bibliographic information and the directory information. Based on the scheme, automatic classification and automatic file recording of the files can be realized, manual operation is replaced, higher accuracy and processing efficiency are achieved, and a foundation is laid for realization of electronization of the paper files and knowledge and high-order utilization of the electronic files.
In an optional manner of the embodiment of the application, determining the archive type of the archive to be archived and the archive file in the archive to be archived based on the archive image includes:
determining the archive type of the archive to be archived and the archive file in the archive to be archived based on the identification characters and/or the archive format in the archive image.
In the embodiment of the application, the identifier character may be a character included in the archive image, which can represent the type of the archive to which the identifier character belongs, or a character of the archive file to which the identifier character belongs. Taking the identification character of the archive file as an example, if the title of the archive file has the word of "contract", the file can be regarded as a contract.
In the embodiment of the application, the archive format is an archive or a layout format of an archive file in the archive, and the type of the archive or the character of the archive file can be represented. Taking the archive file as an example, some formal files have the same layout format, such as the same lines, shapes, etc.
In the embodiment of the application, the archive type of the archive to be archived and the archive file in the archive to be archived can be determined based on at least one of the identification character and the archive format in the archive image.
In an optional mode of this application embodiment, based on the identification character and the archive format in the archive image, determine the archive type of the archive to be archived and the archive file in the archive to be archived, include:
determining whether the character recognition result of the archival image contains a preset target identification character;
if yes, determining the archive type of the archive to be archived and the archive file in the archive to be archived based on the first association relationship between the target identification character and the archive type and the second association relationship between the archive files;
if not, determining the archive type of the archive to be archived and the archive file in the archive to be archived based on the archive format of the archive image.
In the embodiment of the application, the type of the archive to be archived and the archive file in the archive to be archived are determined together based on the identification characters and the archive format in the archive image. Whether the character recognition result of the archive image contains preset target identification characters or not can be determined, and the target characters can represent the type of the archive to be archived and the archive files in the archive to be archived.
Specifically, a first association relationship between the target identification character and the archive type may be preset, and a second association relationship between the target identification character and the archive file may be preset, so that when the target character is determined to be present in the archive image, the archive type of the archive to be archived and the archive file in the archive to be archived are determined.
If the target identification character cannot be found in the character recognition result of the archive image, the archive type of the archive to be archived and the archive file in the archive to be archived can be determined based on the archive format of the archive image.
In an optional mode of the embodiment of the application, determining an archive type of an archive to be archived and an archive file in the archive to be archived based on an archive format of an archive image includes:
clustering the archive to be archived based on the archive format in the archive image;
and determining the archive type of the archive to be archived and the archive files in the archive to be archived based on the processing result of the clustering process.
In this application embodiment, can extract the archives version in the archives image to carry out cluster processing based on the archives version, specifically speaking, can treat to file archives based on the archives version and carry out cluster processing, also can treat to file the archives in the archives based on the archives version and carry out cluster processing. The archives to be archived belonging to the same archive type can be clustered into the same cluster, and the archives belonging to the same archive file can be clustered into the same cluster. And determining the archive type of the archive to be archived and the archive files specifically contained in the archive to be archived according to the clustering result.
In an optional mode of the embodiment of the present application, extracting bibliographic information from a character recognition result of an archive image based on an archive type includes:
determining a bibliographic field corresponding to the file type;
the contents of the bibliographic field are extracted from the character recognition result of the file image as bibliographic information.
In the embodiment of the present application, the bibliographic fields in the files corresponding to each file type may be different, so that the corresponding relationship between the file type and the bibliographic fields can be preset, and the bibliographic fields can be extracted for the files of different file types.
In an optional manner of this embodiment of the application, before filing the archive image based on the archive type, the bibliographic information, and the directory information, the method further includes:
identifying a blank image corresponding to a blank page of an archive to be archived;
and deleting the blank image in the archival image.
In the embodiment of the present application, if a blank page possibly existing in the archive to be archived is scanned to a blank back image of the archive page, the blank image is archived together, which wastes storage space. Blank images in the archival image can be identified and deleted from the archival image to reduce the use of storage space.
In an optional manner of the embodiment of the present application, the method further includes:
based on OCR (Optical Character Recognition), the archive image is recognized, and a Character Recognition result is obtained.
In the embodiment of the present application, a GPU (General Processing Unit) server may be deployed, and the archive image is identified by using an OCR technology to obtain a character identification result.
In an optional mode of the embodiment of the application, the file image is recognized based on OCR to obtain a character recognition result, including:
recognizing the archive image based on an OCR (optical character recognition) to obtain an initial character recognition result;
performing semantic analysis on the initial character recognition result to obtain a semantic analysis result;
and adjusting the initial character recognition result according to the voice analysis result to obtain a character recognition result.
In the embodiment of the application, the file image can be recognized based on the OCR technology to obtain the initial character recognition result, and then the initial character recognition result is analyzed through the semantic analysis model to obtain the semantic analysis result. And adjusting the initial character recognition result through the voice analysis result to obtain a character recognition result.
By carrying out semantic analysis on the character recognition result, the accuracy of the character recognition result can be improved, and a foundation is provided for improving the accuracy of the extraction of the bibliographic information.
In an optional manner of the embodiment of the present application, the method further includes:
when an archive inquiry request for the target archive is received, the archive image of the target archive is inquired in the archived archive images based on the bibliographic information and the directory information.
In the embodiment of the application, after the archive is filed, a query interface can be provided. The user can initiate a file query request through the terminal device, and the file query request can carry query information, such as a file name, a name of a file, and the like. Based on the bibliographic information and the directory information, the files can be quickly determined, and quick inquiry of the files is facilitated.
Fig. 2 is a schematic diagram illustrating a file digitization process according to an embodiment of the present application, and as shown in fig. 2, an administrator performs file type configuration, category bibliographic field configuration, and file directory configuration, that is, configures identification rules of file type, bibliographic information, and directory information, so as to achieve identification of file type, bibliographic information, and directory information.
The scanning personnel arranges the archives to be archived and scans the archives into archives images.
The ICR identification module processes the scanned image, and performs file type identification, bibliographic field identification, directory identification influence and blank page deletion operations, namely, the file type, bibliographic information and directory information are identified through the file image, and the blank page in the file image is deleted.
And the supplementary recording personnel manually checks the identification result, and if the verification fails, the identification process needs to be carried out again. And if the verification is passed, performing archive archiving on the archive to be archived.
The archives utilize personnel can carry out archives inquiry to the archives management system as required, retrieve the archives image.
Based on the same principle as the method shown in fig. 1, fig. 3 shows a schematic structural diagram of an archive device for an archive provided by an embodiment of the present application, and as shown in fig. 3, the archive device 20 for an archive may include:
an image obtaining module 210, configured to obtain an archive image of an archive to be archived;
an archive type and archive file identification module 220 for determining an archive type of an archive to be archived and an archive file in the archive to be archived based on the archive image;
a bibliographic information extraction module 230, configured to extract bibliographic information from a character recognition result of the file image based on the file type;
a directory information extraction module 240 for determining directory information based on the archive file;
an archive filing module 250 for filing the archive image based on the archive type, bibliographic information, and directory information.
The device provided by the embodiment of the application determines the file type of the file to be filed and the file files in the file to be filed based on the file image of the file to be filed, extracts the bibliographic information from the character recognition result of the file image based on the file type, and determines the directory information based on the file files, thereby filing the file image based on the file type, the bibliographic information and the directory information. Based on the scheme, automatic classification and automatic file recording of the files can be realized, manual operation is replaced, higher accuracy and processing efficiency are achieved, and a foundation is laid for realization of electronization of the paper files and knowledge and high-order utilization of the electronic files.
Optionally, the archive type and archive file identification module is specifically configured to:
determining the archive type of the archive to be archived and the archive file in the archive to be archived based on the identification characters and/or the archive format in the archive image.
Optionally, the archive type and archive file identification module is specifically configured to, when determining the archive type of the archive to be archived and the archive file in the archive to be archived based on the identifier character and the archive format in the archive image:
determining whether the character recognition result of the archival image contains a preset target identification character;
if yes, determining the archive type of the archive to be archived and the archive file in the archive to be archived based on the first association relationship between the target identification character and the archive type and the second association relationship between the archive files;
if not, determining the archive type of the archive to be archived and the archive file in the archive to be archived based on the archive format of the archive image.
Optionally, the archive type and archive file identification module is specifically configured to, when determining the archive type of the archive to be archived and the archive file in the archive to be archived based on the archive format of the archive image:
clustering the archive to be archived based on the archive format in the archive image;
and determining the archive type of the archive to be archived and the archive files in the archive to be archived based on the processing result of the clustering process.
Optionally, the bibliographic information extraction module is specifically configured to:
determining a bibliographic field corresponding to the file type;
the contents of the bibliographic field are extracted from the character recognition result of the file image as bibliographic information.
Optionally, the apparatus further comprises:
and the blank image deleting module is used for identifying a blank image corresponding to a blank page of the file to be archived and deleting the blank image in the file image before the file image is archived based on the file type, the bibliographic information and the directory information.
Optionally, the apparatus further comprises:
and the character recognition module is used for recognizing the archive image based on the OCR to obtain a character recognition result.
Optionally, the character recognition module is specifically configured to:
recognizing the archive image based on an OCR (optical character recognition) to obtain an initial character recognition result;
performing semantic analysis on the initial character recognition result to obtain a semantic analysis result;
and adjusting the initial character recognition result according to the voice analysis result to obtain a character recognition result.
Optionally, the apparatus further comprises:
and the archive inquiry module is used for inquiring the archive image of the target archive in the archived archive image based on the bibliographic information and the directory information when receiving an archive inquiry request for the target archive.
It is understood that the above modules of the archive device in the present embodiment have functions of implementing the corresponding steps of the archive method in the embodiment shown in fig. 1. The function can be realized by hardware, and can also be realized by executing corresponding software by hardware. The hardware or software includes one or more modules corresponding to the functions described above. The modules can be software and/or hardware, and each module can be implemented independently or by integrating a plurality of modules. For the functional description of each module of the archive filing apparatus, reference may be specifically made to the corresponding description of the archive filing method in the embodiment shown in fig. 1, and details are not described here again.
The embodiment of the application provides an electronic device, which comprises a processor and a memory;
a memory for storing operating instructions;
and the processor is used for executing the archive filing method of the archive provided by any embodiment of the application by calling the operation instruction.
As an example, fig. 4 shows a schematic structural diagram of an electronic device to which an embodiment of the present application is applicable, and as shown in fig. 4, the electronic device 2000 includes: a processor 2001 and a memory 2003. Wherein the processor 2001 is coupled to a memory 2003, such as via a bus 2002. Optionally, the electronic device 2000 may also include a transceiver 2004. It should be noted that the transceiver 2004 is not limited to one in practical applications, and the structure of the electronic device 2000 is not limited to the embodiment of the present application.
The processor 2001 is applied to the embodiment of the present application to implement the method shown in the above method embodiment. The transceiver 2004 may include a receiver and a transmitter, and the transceiver 2004 is applied to the embodiments of the present application to implement the functions of the electronic device of the embodiments of the present application to communicate with other devices when executed.
The Processor 2001 may be a CPU (Central Processing Unit), general Processor, DSP (Digital Signal Processor), ASIC (Application Specific Integrated Circuit), FPGA (Field Programmable Gate Array) or other Programmable logic device, transistor logic device, hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure. The processor 2001 may also be a combination of computing functions, e.g., comprising one or more microprocessors, DSPs and microprocessors, and the like.
The Memory 2003 may be a ROM (Read Only Memory) or other type of static storage device that can store static information and instructions, a RAM (Random Access Memory) or other type of dynamic storage device that can store information and instructions, an EEPROM (Electrically Erasable Programmable Read Only Memory), a CD-ROM (Compact Disc Read Only Memory) or other optical Disc storage, optical Disc storage (including Compact Disc, laser Disc, optical Disc, digital versatile Disc, blu-ray Disc, etc.), a magnetic disk storage medium or other magnetic storage device, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to these.
Optionally, the memory 2003 is used for storing application program code for performing the disclosed aspects, and is controlled in execution by the processor 2001. The processor 2001 is configured to execute the application program code stored in the memory 2003 to implement the archive method of the archive provided in any of the embodiments of the present application.
The electronic device provided by the embodiment of the application is applicable to any embodiment of the method, and is not described herein again.
Compared with the prior art, the electronic equipment determines the file type of a file to be filed and the file files in the file to be filed based on the file images of the file to be filed, extracts the bibliographic information from the character recognition result of the file images based on the file type, and determines the directory information based on the file files, so that the file images are filed based on the file type, the bibliographic information and the directory information. Based on the scheme, automatic classification and automatic file recording of the files can be realized, manual operation is replaced, higher accuracy and processing efficiency are achieved, and a foundation is laid for realization of electronization of the paper files and knowledge and high-order utilization of the electronic files.
The embodiment of the present application provides a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, the computer program implements the archive filing method shown in the above method embodiment.
The computer-readable storage medium provided in the embodiments of the present application is applicable to any of the embodiments of the foregoing method, and is not described herein again.
The embodiment of the present application provides a computer-readable storage medium, which, in comparison with the prior art, determines an archive type of an archive to be archived and an archive file in the archive to be archived based on an archive image of the archive to be archived, extracts bibliographic information from a character recognition result of the archive image based on the archive type, and determines directory information based on the archive file, thereby archiving the archive image based on the archive type, the bibliographic information, and the directory information. Based on the scheme, automatic classification and automatic file recording of the files can be realized, manual operation is replaced, higher accuracy and processing efficiency are achieved, and a foundation is laid for realization of electronization of the paper files and knowledge and high-order utilization of the electronic files.
It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
The foregoing is only a partial embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.
Claims (12)
1. An archive method of an archive, comprising:
acquiring an archive image of an archive to be archived;
determining an archive type of the archive to be archived and an archive file in the archive to be archived based on the archive image;
extracting bibliographic information from a character recognition result of the archive image based on the archive type;
determining directory information based on the archive file;
archiving the archive image based on the archive type, the bibliographic information, and the catalog information.
2. The method of claim 1, wherein determining an archive type of the archive to be archived and archive files in the archive to be archived based on the archive image comprises:
and determining the archive type of the archive to be archived and the archive file in the archive to be archived based on the identification characters and/or the archive format in the archive image.
3. The method of claim 2, wherein determining the archive type of the archive to be archived and the archive file in the archive to be archived based on the identification characters and the archive format in the archive image comprises:
determining whether the character recognition result of the archival image contains a preset target identification character;
if yes, determining the archive type of the archive to be archived and the archive file in the archive to be archived based on the first association relationship between the target identification character and the archive type and the second association relationship between the archive files;
if not, determining the archive type of the archive to be archived and the archive file in the archive to be archived based on the archive format of the archive image.
4. The method of claim 2 or 3, wherein determining the archive type of the archive to be archived and the archive files in the archive to be archived based on the archive format of the archive image comprises:
clustering the archives to be archived based on the archive formats in the archive images;
determining the archive type of the archive to be archived and the archive files in the archive to be archived based on the processing result of the clustering process.
5. The method of claim 1, wherein extracting bibliographic information from the character recognition result of the archival image based on the archival type comprises:
determining a bibliographic field corresponding to the file type;
and extracting the content of the bibliographic field from the character recognition result of the archive image as bibliographic information.
6. The method of any one of claims 1, 2, 3, and 5, wherein prior to the archiving the archival image based on the archival type, the bibliographic information, and the catalog information, the method further comprises:
identifying a blank image corresponding to a blank page of the file to be archived;
and deleting the blank image in the archival image.
7. The method of any one of claims 1, 2, 3, and 5, further comprising:
and identifying the archive image based on Optical Character Recognition (OCR) to obtain the character recognition result.
8. The method of claim 7, wherein recognizing the archival image based on OCR to obtain the character recognition result comprises:
recognizing the archive image based on OCR to obtain an initial character recognition result;
performing semantic analysis on the initial character recognition result to obtain a semantic analysis result;
and adjusting the initial character recognition result according to the voice analysis result to obtain the character recognition result.
9. The method of any one of claims 1, 2, 3, and 5, further comprising:
and when an archive inquiry request for a target archive is received, inquiring an archive image of the target archive in the archived archive images based on the bibliographic information and the directory information.
10. An archive device for files, comprising:
the image acquisition module is used for acquiring a file image of a file to be archived;
the archive type and archive file identification module is used for determining the archive type of the archive to be archived and the archive files in the archive to be archived based on the archive image;
the bibliographic information extraction module is used for extracting bibliographic information from the character recognition result of the file image based on the file type;
the directory information extraction module is used for determining directory information based on the archive file;
and the archive filing module is used for filing the archive image based on the archive type, the bibliographic information and the directory information.
11. An electronic device comprising a processor and a memory;
the memory is used for storing operation instructions;
the processor is used for executing the method of any one of claims 1-9 by calling the operation instruction.
12. A computer-readable storage medium, characterized in that the storage medium has stored thereon a computer program which, when being executed by a processor, carries out the method of any one of claims 1-9.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010840577.9A CN112052749A (en) | 2020-08-20 | 2020-08-20 | Archive filing method and device, electronic equipment and computer readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010840577.9A CN112052749A (en) | 2020-08-20 | 2020-08-20 | Archive filing method and device, electronic equipment and computer readable storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112052749A true CN112052749A (en) | 2020-12-08 |
Family
ID=73599633
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010840577.9A Pending CN112052749A (en) | 2020-08-20 | 2020-08-20 | Archive filing method and device, electronic equipment and computer readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112052749A (en) |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112632009A (en) * | 2020-12-29 | 2021-04-09 | 航天信息股份有限公司 | Electronic file processing method and device, storage medium and electronic equipment |
CN112733658A (en) * | 2020-12-31 | 2021-04-30 | 北京华宇信息技术有限公司 | Electronic document filing method and device |
CN112800949A (en) * | 2021-01-27 | 2021-05-14 | 刘培育 | Artificial intelligence-based paper archive digital processing method, system and equipment |
CN113256241A (en) * | 2021-04-24 | 2021-08-13 | 南京樯图数据研究院有限公司 | Artificial intelligence platform for industrial data archive management |
CN113344510A (en) * | 2021-05-28 | 2021-09-03 | 方欣科技有限公司 | Intelligent tax material online auditing method, device, terminal and storage medium |
CN113377902A (en) * | 2021-05-28 | 2021-09-10 | 南方电网数字电网研究院有限公司 | Digital archive recording configuration method, system, device and storage medium |
CN113742287A (en) * | 2021-08-31 | 2021-12-03 | 远光软件股份有限公司 | Archive data archiving method based on data middlebox, computer device and computer readable storage medium |
CN114067961A (en) * | 2021-10-27 | 2022-02-18 | 武汉联影医疗科技有限公司 | Medical image filing system, method and storage medium |
CN114117095A (en) * | 2022-01-25 | 2022-03-01 | 广东图友软件科技有限公司 | Audio-video archive recording method and device based on image recognition |
CN114298238A (en) * | 2021-12-31 | 2022-04-08 | 瀚云科技有限公司 | File creation method and device, electronic equipment and storage medium |
CN114299527A (en) * | 2021-11-04 | 2022-04-08 | 烟台大学 | Data processing method and device for paper document |
CN115185888A (en) * | 2022-07-27 | 2022-10-14 | 海南绿境高科环保有限公司 | Enterprise environment-friendly archive management method, device, equipment and storage medium |
CN116126790A (en) * | 2023-04-17 | 2023-05-16 | 百盛联合杭温铁路有限公司 | Railway engineering archive archiving method and device, electronic equipment and storage medium |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104217290A (en) * | 2014-09-01 | 2014-12-17 | 南通北城科技创业管理有限公司 | An archive management system |
WO2018126742A1 (en) * | 2017-01-05 | 2018-07-12 | 福建亿榕信息技术有限公司 | Electronic batch processing method and system for stored archives, and storage medium |
CN109598228A (en) * | 2018-11-30 | 2019-04-09 | 泰华智慧产业集团股份有限公司 | Paper document electronization is recorded to the method and system of filing |
CN109783452A (en) * | 2018-12-29 | 2019-05-21 | 福建华闽通达信息技术有限公司 | A kind of the construction project file collection archiving method and system of rule-basedization |
CN109918471A (en) * | 2019-01-28 | 2019-06-21 | 平安科技(深圳)有限公司 | Archive method, apparatus, computer equipment and storage medium |
CN110245112A (en) * | 2019-06-21 | 2019-09-17 | 同略科技有限公司 | Intelligent archive management method, system, terminal and storage medium based on AI |
CN110705515A (en) * | 2019-10-18 | 2020-01-17 | 山东健康医疗大数据有限公司 | Hospital paper archive filing method and system based on OCR character recognition |
CN110852699A (en) * | 2019-10-10 | 2020-02-28 | 暨南大学 | Electronic intelligent management system and method for files |
-
2020
- 2020-08-20 CN CN202010840577.9A patent/CN112052749A/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104217290A (en) * | 2014-09-01 | 2014-12-17 | 南通北城科技创业管理有限公司 | An archive management system |
WO2018126742A1 (en) * | 2017-01-05 | 2018-07-12 | 福建亿榕信息技术有限公司 | Electronic batch processing method and system for stored archives, and storage medium |
CN109598228A (en) * | 2018-11-30 | 2019-04-09 | 泰华智慧产业集团股份有限公司 | Paper document electronization is recorded to the method and system of filing |
CN109783452A (en) * | 2018-12-29 | 2019-05-21 | 福建华闽通达信息技术有限公司 | A kind of the construction project file collection archiving method and system of rule-basedization |
CN109918471A (en) * | 2019-01-28 | 2019-06-21 | 平安科技(深圳)有限公司 | Archive method, apparatus, computer equipment and storage medium |
CN110245112A (en) * | 2019-06-21 | 2019-09-17 | 同略科技有限公司 | Intelligent archive management method, system, terminal and storage medium based on AI |
CN110852699A (en) * | 2019-10-10 | 2020-02-28 | 暨南大学 | Electronic intelligent management system and method for files |
CN110705515A (en) * | 2019-10-18 | 2020-01-17 | 山东健康医疗大数据有限公司 | Hospital paper archive filing method and system based on OCR character recognition |
Non-Patent Citations (2)
Title |
---|
蒋卉;: "归档文件目录著录质量分析和业务操作", 档案与建设, no. 11, 15 November 2013 (2013-11-15) * |
闫丽侠;: "使用"南大之星"软件著录归档文件的实践", 长春师范学院学报, no. 09, 20 September 2012 (2012-09-20) * |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112632009A (en) * | 2020-12-29 | 2021-04-09 | 航天信息股份有限公司 | Electronic file processing method and device, storage medium and electronic equipment |
CN112733658A (en) * | 2020-12-31 | 2021-04-30 | 北京华宇信息技术有限公司 | Electronic document filing method and device |
CN112800949A (en) * | 2021-01-27 | 2021-05-14 | 刘培育 | Artificial intelligence-based paper archive digital processing method, system and equipment |
CN113256241A (en) * | 2021-04-24 | 2021-08-13 | 南京樯图数据研究院有限公司 | Artificial intelligence platform for industrial data archive management |
CN113344510A (en) * | 2021-05-28 | 2021-09-03 | 方欣科技有限公司 | Intelligent tax material online auditing method, device, terminal and storage medium |
CN113377902A (en) * | 2021-05-28 | 2021-09-10 | 南方电网数字电网研究院有限公司 | Digital archive recording configuration method, system, device and storage medium |
CN113742287A (en) * | 2021-08-31 | 2021-12-03 | 远光软件股份有限公司 | Archive data archiving method based on data middlebox, computer device and computer readable storage medium |
CN113742287B (en) * | 2021-08-31 | 2024-09-03 | 远光软件股份有限公司 | Archive data archiving method based on data center, computer device and computer readable storage medium |
CN114067961A (en) * | 2021-10-27 | 2022-02-18 | 武汉联影医疗科技有限公司 | Medical image filing system, method and storage medium |
CN114299527A (en) * | 2021-11-04 | 2022-04-08 | 烟台大学 | Data processing method and device for paper document |
CN114298238A (en) * | 2021-12-31 | 2022-04-08 | 瀚云科技有限公司 | File creation method and device, electronic equipment and storage medium |
CN114117095A (en) * | 2022-01-25 | 2022-03-01 | 广东图友软件科技有限公司 | Audio-video archive recording method and device based on image recognition |
CN115185888A (en) * | 2022-07-27 | 2022-10-14 | 海南绿境高科环保有限公司 | Enterprise environment-friendly archive management method, device, equipment and storage medium |
CN116126790A (en) * | 2023-04-17 | 2023-05-16 | 百盛联合杭温铁路有限公司 | Railway engineering archive archiving method and device, electronic equipment and storage medium |
CN116126790B (en) * | 2023-04-17 | 2023-07-11 | 百盛联合杭温铁路有限公司 | Railway engineering archive archiving method and device, electronic equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112052749A (en) | Archive filing method and device, electronic equipment and computer readable storage medium | |
US9002838B2 (en) | Distributed capture system for use with a legacy enterprise content management system | |
US20160055376A1 (en) | Method and system for identification and extraction of data from structured documents | |
US9390089B2 (en) | Distributed capture system for use with a legacy enterprise content management system | |
US20050289182A1 (en) | Document management system with enhanced intelligent document recognition capabilities | |
CN107291949B (en) | Information searching method and device | |
CN108509542A (en) | A kind of quick filing system of archives and its archiving method | |
CN110688349A (en) | Document sorting method, device, terminal and computer readable storage medium | |
CN112560411A (en) | Intelligent personnel information input method and system | |
CN108304815A (en) | A kind of data capture method, device, server and storage medium | |
CN111898433A (en) | Paper bill digitization method and device | |
CN115116068A (en) | Archive intelligent filing system based on OCR | |
JP6786658B2 (en) | Document reading system | |
CN111859885A (en) | Automatic generation method and system for legal decision book | |
JP6127597B2 (en) | Information processing apparatus, control method thereof, and program | |
WO2024012209A1 (en) | Image recognition-based service processing method and apparatus, and storage medium | |
US7532368B2 (en) | Automated processing of paper forms using remotely-stored form content | |
CN112364790B (en) | Airport work order information identification method and system based on convolutional neural network | |
CN114495138A (en) | Intelligent document identification and feature extraction method, device platform and storage medium | |
CN113742286A (en) | Archive data filing layout file generation method, computer device and computer readable storage medium | |
JP4480109B2 (en) | Image management apparatus and image management method | |
CN111046864A (en) | Method and system for automatically extracting five elements of contract scanning piece | |
JP2003316802A (en) | Image management system, image management method and image management program | |
CN112506873B (en) | Automatic data entry management system for physical archives | |
CN116719839B (en) | Data query method and device of accounting file and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
TA01 | Transfer of patent application right | ||
TA01 | Transfer of patent application right |
Effective date of registration: 20220915 Address after: 25 Financial Street, Xicheng District, Beijing 100033 Applicant after: CHINA CONSTRUCTION BANK Corp. Address before: 25 Financial Street, Xicheng District, Beijing 100033 Applicant before: CHINA CONSTRUCTION BANK Corp. Applicant before: Jianxin Financial Science and Technology Co.,Ltd. |