CN116150105A - Reading and analyzing method and system for electronic file long-term storage package - Google Patents

Reading and analyzing method and system for electronic file long-term storage package Download PDF

Info

Publication number
CN116150105A
CN116150105A CN202310424805.8A CN202310424805A CN116150105A CN 116150105 A CN116150105 A CN 116150105A CN 202310424805 A CN202310424805 A CN 202310424805A CN 116150105 A CN116150105 A CN 116150105A
Authority
CN
China
Prior art keywords
package
file
target
metadata
path information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310424805.8A
Other languages
Chinese (zh)
Other versions
CN116150105B (en
Inventor
由伟希
张海青
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Yunxuewei Technology Co ltd
Original Assignee
Beijing Yunxuewei Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Yunxuewei Technology Co ltd filed Critical Beijing Yunxuewei Technology Co ltd
Priority to CN202310424805.8A priority Critical patent/CN116150105B/en
Publication of CN116150105A publication Critical patent/CN116150105A/en
Application granted granted Critical
Publication of CN116150105B publication Critical patent/CN116150105B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/11File system administration, e.g. details of archiving or snapshots
    • G06F16/113Details of archiving
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/172Caching, prefetching or hoarding of files
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a method and a system for reading and analyzing a package for long-term storage of an electronic file, and relates to the field of file information management. The method comprises the following steps: constructing intra-packet path information of the archive metadata file by using a placeholder processing method based on the handover manifest information in the target encapsulation packet; acquiring the file metadata file in the target package based on the in-package path information of the file metadata file, analyzing the electronic file name contained in the corresponding file metadata file, and constructing the in-package path information of the electronic file by using a placeholder processing method based on the file name and the in-package path information; and reading and analyzing the target electronic file in the target package based on the path information in the package of the electronic file. The method does not need a programmer to write the fixed structure of the package into the code in advance, can improve the reading and analyzing efficiency of the package stored in the electronic file for a long time, reduces the construction period of the project, and brings great benefit to users.

Description

Reading and analyzing method and system for electronic file long-term storage package
Technical Field
The invention relates to the field of file information management, in particular to a method and a system for reading and analyzing a package for long-term storage of an electronic file.
Background
An electronic archive is an archive-stored electronic file having vouchers, examination and preservation values, generally referring to a collection of interrelated, general-purpose electronic files stored via an electronic storage medium with metadata associated therewith for additional description. Similar to paper files, electronic files also have long-term storage requirements, and in order to ensure that the electronic files can be applied in different scenes in the later period, the electronic files are usually packaged and stored in a certain and open format according to the unit of files, which is called as a long-term storage package of the electronic files. Since there are multiple subsequent works such as regular inspection, quality inspection, management, and utilization of the archive data, and multiple systems are involved, it is particularly important to realize efficient reading and analysis of the corresponding package. At present, the national archives bureau has related standard specifications for long-term storage package architecture and directory hierarchy, but due to the fact that the archives of different business units are different in types, metadata lists and organization modes, along with the expansion of the range of electronic archives, more and more kinds of electronic files begin to be archived and stored as electronic archives, new requirements and challenges are brought to the reading and analysis of the long-term storage package, and the reading and analysis capability of the flexible and changeable package becomes important.
In the traditional method, the package is read and parsed by using a hard coding mode, namely, a programmer determines the package structure and the field type in advance and a user side, and a program writes a specified structure or field into a code by adopting a programming mode, so that the program can only identify the fixed type package structure, when the metadata field of the electronic file in the package changes or a new electronic file type package appears, the original system cannot read and parse, and the package type can only be introduced by adopting a reprogramming mode, thereby obviously greatly reducing the efficiency of long-term storage work of the electronic file and influencing the development of informatization of the electronic file.
Disclosure of Invention
The invention aims to provide a method and a system for reading and analyzing a package for long-term storage of an electronic file, which do not need a programmer to write a package fixing structure into a code in advance, but provide guidance for the analysis process of the package by an online configuration package structure mode, and can provide flexible configuration according to the package structure or later modification, thereby improving the efficiency of reading and analyzing the package for long-term storage of the electronic file, reducing the construction period of the project and bringing great benefit to users.
The invention is realized in the following way:
in a first aspect, the present application provides a method for reading and analyzing a package for long-term storage of an electronic file, including the steps of:
constructing intra-packet path information of the archive metadata file by using a placeholder processing method based on handover manifest information in the target encapsulation packet, wherein the handover manifest information is information for describing archive entries of the encapsulation packet in the target encapsulation packet; acquiring the file metadata file in the target package based on the in-package path information of the file metadata file, analyzing the electronic file name contained in the corresponding file metadata file, and constructing the in-package path information of the electronic file by using a placeholder processing method based on the file name and the in-package path information; and reading and analyzing the target electronic file in the target package based on the path information in the package of the electronic file.
Further, the method for constructing the intra-packet path information of the archive metadata file based on the handover manifest information in the target encapsulation packet by using the placeholder processing method includes: and respectively acquiring metadata name information of all electronic files in the target package according to the handover list information, and filling the sequentially acquired metadata names of the same electronic file into the same group of preset placeholders to obtain intra-package path information of all archive metadata files in the target package.
Further, the method further comprises the step of performing data cleaning on the obtained intra-package path information of the archive metadata file, wherein the data cleaning comprises the following steps: and judging illegal characters on the obtained path information in the package of the file metadata file, and deleting the path information in the package of the file metadata file judged to comprise illegal characters.
Further, the method for constructing the intra-packet path information of the archive metadata file based on the handover manifest information in the target encapsulation packet by using the placeholder processing method includes: respectively acquiring metadata information of all electronic files in a target package according to the handover list information, and performing isolation processing on metadata of different groups based on a data structure of predetermined tree metadata, wherein the isolation processing comprises: maintaining a multi-tree data structure in the parsing process, when the node type needing to isolate the metadata is encountered, performing bifurcation processing on the multi-tree, and placing the metadata in the subsequent child nodes in different child nodes; when the placeholder is encountered, the source-tracing search is performed upwards until the base node is found.
Further, a plurality of different parsing models are included, each parsing model for parsing a different type of parsing file. The reading and analyzing the target electronic file in the target package based on the path information in the package of the electronic file includes: and reading the target electronic file in the target package based on the path information in the package of the electronic file, and carrying out analysis processing according to the type matching corresponding analysis model of the target electronic file.
In a second aspect, the present application provides a reading and parsing system for long-term storage package of electronic files, comprising:
a path construction module configured to construct intra-packet path information of the archive metadata file using a placeholder processing method based on handover manifest information in the target encapsulation packet, wherein the handover manifest information is information describing an encapsulation packet archive entry within the target encapsulation packet; the path perfecting module is configured to acquire the file metadata file in the target package based on the package path information of the file metadata file, analyze the electronic file name contained in the corresponding file metadata file, and construct the package path information of the electronic file by using a placeholder processing method based on the file name and the package path information; and the reading and analyzing module is configured to read and analyze the target electronic file in the target package based on the package path information of the electronic file.
In a third aspect, the present application provides an electronic device comprising at least one processor, at least one memory, and a data bus; wherein: the processor and the memory complete communication with each other through the data bus; the memory stores program instructions for execution by the processor, the processor invoking the program instructions to perform the method of any of the first aspects.
In a fourth aspect, the present application provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method as described in any of the first aspects above.
Compared with the prior art, the invention has at least the following advantages or beneficial effects:
(1) By optimizing the reading and analyzing process of the package, the purpose of effectively reading and analyzing the package can be achieved without writing the package fixing structure into the code in advance by a programmer. Therefore, the reading and analyzing efficiency of the package for long-term storage of the electronic file can be improved, the construction period of the project is shortened, and great benefit improvement is brought to users.
(2) The processing method of the placeholder is utilized to construct the path in the package, so that the path of the corresponding electronic file can be accurately and effectively obtained, and theoretical support is provided for subsequent reading and analysis operations of the electronic file. The method comprises the steps that the path is associated with one type of analysis file, the analysis process is continuously initiated downwards, and finally, the access and analysis processes of all files in the whole package are realized.
(3) By adopting a data structure of tree metadata to isolate different components, illegal results generated when corresponding path information in the package is generated according to placeholders can be avoided, and the purposes of unique and determined construction results can be realized.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a method for reading and parsing a package for long-term storage of an electronic file according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of an encapsulating package according to an embodiment of the invention;
fig. 3 is a schematic view of contents of a handover manifest file according to an embodiment of the present invention;
fig. 4 is a schematic diagram illustrating an analysis structure of a handover list according to an embodiment of the present invention;
FIG. 5 is a flow chart of the method for isolating different components by using a tree-like metadata structure according to an embodiment of the present invention;
FIG. 6 is a schematic diagram illustrating a parsing process of various types of electronic documents according to an embodiment of the present invention;
FIG. 7 is a block diagram illustrating an embodiment of a system for reading and parsing a package for long-term storage of electronic files according to the present invention;
fig. 8 is a block diagram of an electronic device according to an embodiment of the present invention.
Icon: 1. a path construction module; 2. a path perfecting module; 3. reading an analysis module; 4. a processor; 5. a memory; 6. a data bus.
Detailed Description
For the purposes of making the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. The components of the embodiments of the present application, which are generally described and illustrated in the figures herein, may be arranged and designed in a wide variety of different configurations.
Some embodiments of the present application are described in detail below with reference to the accompanying drawings. The various embodiments and features of the embodiments described below may be combined with one another without conflict.
Example 1
The embodiment of the application provides a method for reading and analyzing a package for long-term storage of an electronic file, which does not need a programmer to write a package fixing structure into a code in advance, but provides guidance for the analysis process of the package by configuring the package structure on line, and can provide flexible configuration according to the package structure or post-modification, so that the efficiency for reading and analyzing the package for long-term storage of the electronic file can be improved, the construction period of the project is reduced, and great benefit improvement is brought to users. The technical means includes that the method includes that a parsing program is guided to perform hierarchical parsing on package data based on definition of content fields of a type of parsing files in a package; and realizing automatic positioning of the target file for realizing layer-by-layer analysis on the path binding in the analysis file.
Referring to fig. 1, the method for reading and analyzing the package for long-term storage of the electronic file includes the following steps:
step S101: and constructing the intra-packet path information of the archive metadata file by using a placeholder processing method based on the handover list information in the target encapsulation packet, wherein the handover list information is information used for describing the archive entry of the encapsulation packet in the target encapsulation packet.
The electronic file analysis flow in the package is initiated by adopting a field clue, and in the analysis process, the analysis file is taken as a unit, and each analysis is carried out aiming at a specific file. Thus, it is necessary to know the location of the parsed file. If the analysis file has a fixed position, the analysis file can be directly acquired. However, the range of the electronic file is larger and larger, so that the position of the analysis file corresponding to the electronic file is not determined any more, and accurate acquisition is difficult to directly perform. In the above step, the intra-package path information of the archive metadata file is constructed by using the placeholder processing method based on the handover list information in the target package, and one or more specific values can be constructed for the content to be analyzed, so that the subsequent processing is facilitated.
Illustratively, as shown in fig. 2, there is a fixed "handover list xml" at the top of the package, which describes file entries in the entire package, and basic information of two files, such as full-number (123), case numbers (0001 and 0002), and file numbers (0001 and 0002), from which the intra-package path of each file metadata file, such as "{ full-number }/{ case number }/XSD { case number }. Xml", is constructed by a placeholder method, and two intra-package paths of file metadata files, such as 123/0001/XSD0001.Xml and 123/0002/XSD0002.Xml, are constructed by such placeholder format. It can be seen that when constructing the path, the program can acquire the data in the placeholder in an upward backtracking searching mode through the tree structure, and the same group of the file number and the file number are isolated through the tree node, so that two illegal results, namely 123/0002/XSD0001.Xml and 123/0001/XSD0002.Xml, cannot occur.
In some embodiments of the present invention, the constructing the intra-packet path information of the archive metadata file based on the handover manifest information in the target encapsulation packet using the placeholder processing method includes: and respectively acquiring metadata name information of all electronic files in the target package according to the handover list information, and filling the sequentially acquired metadata names of the same electronic file into the same group of preset placeholders to obtain intra-package path information of all archive metadata files in the target package.
In the above step, for the metadata names of the same electronic file, the metadata names are directly filled into the placeholders of the same group according to the acquired sequence, so as to generate a complete path of the corresponding electronic file (i.e. the path information in the package of the corresponding archive metadata file), so as to inspire the analysis process of the subsequent files of other types, and finally complete the analysis process of the files in the whole package.
In some embodiments of the present invention, the method further includes performing data cleansing on the obtained intra-package path information of the archive metadata file, where the data cleansing includes: and judging illegal characters on the obtained path information in the package of the file metadata file, and deleting the path information in the package of the file metadata file judged to comprise illegal characters.
For the method of constructing the intra-packet path filled with placeholders according to the acquisition sequence of the metadata names, some illegal and unusable intra-packet paths may be constructed, so that illegal character judgment can be performed on the intra-packet paths, and deletion processing is performed on the non-conforming intra-packet paths, so that the construction accuracy of the intra-packet paths can be improved.
Referring to fig. 5, in some embodiments of the present invention, the method for constructing intra-packet path information of an archive metadata file based on handover manifest information in a target encapsulation packet by using a placeholder processing method includes:
respectively acquiring metadata information of all electronic files in a target package according to the handover list information, and performing isolation processing on metadata of different groups based on a data structure of preset tree metadata, wherein the isolation processing comprises the following steps: maintaining a multi-tree data structure in the parsing process, when the node type needing to isolate the metadata is encountered, performing bifurcation processing on the multi-tree, and placing the metadata in the subsequent child nodes in different child nodes; when the placeholder is encountered, the source-tracing search is performed upwards until the base node is found.
When constructing a field by a placeholder method, particularly when constructing a field for a loop structure, an isolation operation of metadata needs to be considered. For example, referring to FIGS. 3-4, there are two file numbers and metadata names parsed from the handoff list xml file, but only one contiguous group is meaningful in terms of the placeholder formation path (i.e., 0001 and XSD0001, 0002 and XSD 0002), which requires that the child node be the metadata (parent node) and not the collateral metadata (sibling node) to be employed in accurately constructing the placeholder path. Because the data of the brother nodes are different individuals of the same type of data in the parsing process, such as a plurality of files in one package, the direct data are different in content though the fields are consistent, and the direct data should be processed separately.
Thus, in the above steps, by adopting a data structure of tree metadata to perform different component data isolation, illegal results can be avoided from being generated when generating character strings according to placeholders (when generating intra-packet paths) to influence the accuracy of intra-packet path construction. Through the structure and the query mode, the purposes of unique and determined construction results can be realized.
Illustratively, in some embodiments of the present invention, acquiring metadata information of all electronic files in the target package according to the above-described handover manifest information, respectively, includes: if the corresponding electronic file is an XML type file, the program will extract the data in the XML type file according to the set field list, and create a new tree node for the array type data. Thus, the same type of metadata of different arrays can then be isolated based on the data structure of such tree metadata nodes. Wherein, the isolation processing includes: maintaining a multi-way tree structure in the parsing process, when the node type needing to isolate the metadata is encountered, carrying out bifurcation processing on the multi-way tree, and placing the metadata in the follow-up sub-nodes in different sub-nodes, namely separating the same type of metadata in the brother nodes, wherein only the follow-up data query and the path construction process can trace upwards for searching, and the brother nodes of the same level can not be searched; when the placeholder is encountered, the source-tracing search is performed upwards until the base node is found.
With continued reference to fig. 1, step S102: and acquiring the file metadata file in the target package based on the in-package path information of the file metadata file, analyzing the electronic file name contained in the corresponding file metadata file, and constructing the in-package path information of the electronic file by using a placeholder processing method based on the file name and the in-package path information.
In the above steps, after the path information in the package of the archive metadata file is constructed, the path is further perfected, and the path information in the package of the electronic file is constructed for positioning to the specific electronic file. For example, referring to fig. 2, after two intra-package paths of archive metadata files are obtained, the parsing process is continuously started on the basis of the two intra-package paths to complete the parsing of the two metadata files, and by parsing the file names contained in the pieces, intra-package paths (123/0001/data_1. Pdf, 123/0001/data_2.Pdf, 123/0002/data_3.Pdf, 123/0002/data_4. Pdf) of the electronic files can be continuously constructed, so that the reading and parsing of all files in the package can be finally completed.
Step S103: and reading and analyzing the target electronic file in the target package based on the path information in the package of the electronic file. Therefore, specific and complete path information in the package of the electronic file can be read and analyzed in a targeted manner, a programmer is not required to write the fixed structure of the package into a code in advance, accurate reading and analysis of electronic files with various types and numbers can be realized, the reading and analysis efficiency of the package stored in the electronic file for a long time is improved, the construction period of the type of project is reduced, and great benefit improvement is brought to users.
In some embodiments of the present invention, a plurality of different parsing models are included, each for parsing a different type of parsing file;
the reading and analyzing the target electronic file in the target package based on the path information in the package of the electronic file includes: and reading the target electronic file in the target package based on the path information in the package of the electronic file, and carrying out analysis processing according to the type matching corresponding analysis model of the target electronic file. By way of example, the corresponding matching parsing may be performed for the types of XML structures, non-physical file structures (pure path constructs), common files, etc.
As shown in fig. 6, for a package, it may be composed of multiple files, where one file contains multiple electronic texts (such as PDF or OFD files), and by inputting an analysis model into the system, multiple types of analysis files are set, and each file adopts a set of field analysis methods, and finally, the analysis process of all files in the whole package can be completed. It should be noted that, the above-mentioned analysis model may be a total model (including analysis processes of multiple different analysis models), or may be different analysis models that are distinguished by types, so that the analysis models of the corresponding types may be matched according to the types of the target electronic files, so as to implement targeted analysis processes.
Example 2
Referring to fig. 7, an embodiment of the present application provides a system for reading and analyzing a package for long-term storage of an electronic file, which includes:
and a path construction module 1 configured to construct intra-packet path information of the archive metadata file using a placeholder processing method based on handover manifest information in the target encapsulation packet, wherein the handover manifest information is information describing an encapsulation packet archive entry within the target encapsulation packet. And the path perfecting module 2 is configured to acquire the archive metadata file in the target encapsulation package based on the package path information of the archive metadata file, analyze the electronic file name contained in the corresponding archive metadata file, and construct the package path information of the electronic file by using a placeholder processing method based on the file name and the package path information. And the reading and analyzing module 3 is configured to read and analyze the target electronic file in the target package based on the package path information of the electronic file.
The specific implementation process of the above system refers to a method for reading and analyzing a package for long-term storage of electronic files provided in embodiment 1, and is not described herein.
Example 3
Referring to fig. 8, an embodiment of the present application provides an electronic device comprising at least one processor 4, at least one memory 5 and a data bus 6; wherein: the processor 4 and the memory 5 complete the communication with each other through the data bus 6; the memory 5 stores program instructions executable by the processor 4, which are called by the processor 4 to perform a method of reading and parsing the long-term storage package of electronic files. For example, implementation:
and constructing the intra-packet path information of the archive metadata file by using a placeholder processing method based on the handover list information in the target encapsulation packet, wherein the handover list information is information used for describing the archive entry of the encapsulation packet in the target encapsulation packet. And acquiring the file metadata file in the target package based on the in-package path information of the file metadata file, analyzing the electronic file name contained in the corresponding file metadata file, and constructing the in-package path information of the electronic file by using a placeholder processing method based on the file name and the in-package path information. And reading and analyzing the target electronic file in the target package based on the path information in the package of the electronic file.
The Memory 5 may be, but is not limited to, a random access Memory (Random Access Memory, RAM), a Read Only Memory (ROM), a programmable Read Only Memory (Programmable Read-Only Memory, PROM), an erasable Read Only Memory (Erasable Programmable Read-Only Memory, EPROM), an electrically erasable Read Only Memory (Electric Erasable Programmable Read-Only Memory, EEPROM), etc.
The processor 4 may be an integrated circuit chip with signal processing capabilities. The processor 4 may be a general-purpose processor including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc.; but also digital signal processors (Digital Signal Processing, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
It will be appreciated that the configuration shown in fig. 8 is merely illustrative, and that the electronic device may also include more or fewer components than shown in fig. 8, or have a different configuration than shown in fig. 8. The components shown in fig. 8 may be implemented in hardware, software, or a combination thereof.
Example 4
The present invention provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor 4, implements a method of reading and parsing a long-term storage package for an electronic file. For example, implementation:
and constructing the intra-packet path information of the archive metadata file by using a placeholder processing method based on the handover list information in the target encapsulation packet, wherein the handover list information is information used for describing the archive entry of the encapsulation packet in the target encapsulation packet. And acquiring the file metadata file in the target package based on the in-package path information of the file metadata file, analyzing the electronic file name contained in the corresponding file metadata file, and constructing the in-package path information of the electronic file by using a placeholder processing method based on the file name and the in-package path information. And reading and analyzing the target electronic file in the target package based on the path information in the package of the electronic file.
The above functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
It will be evident to those skilled in the art that the present application is not limited to the details of the foregoing illustrative embodiments, and that the present application may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the application being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.

Claims (8)

1. The method for reading and analyzing the electronic file long-term storage package is characterized by comprising the following steps of:
constructing intra-packet path information of the archive metadata file by using a placeholder processing method based on handover manifest information in the target encapsulation packet, wherein the handover manifest information is information for describing archive entries of the encapsulation packet in the target encapsulation packet;
acquiring the file metadata file in the target package based on the in-package path information of the file metadata file, analyzing the electronic file name contained in the corresponding file metadata file, and constructing the in-package path information of the electronic file by using a placeholder processing method based on the file name and the in-package path information;
and reading and analyzing the target electronic file in the target packaging package based on the package path information of the electronic file.
2. The method for reading and parsing a long-term storage package of an electronic file according to claim 1, wherein the constructing the intra-package path information of the file metadata file based on the handover manifest information in the target package using a placeholder processing method comprises:
and respectively acquiring metadata name information of all electronic files in the target package according to the handover list information, and filling the sequentially acquired metadata names of the same electronic file into the same group of preset placeholders to obtain intra-package path information of all archive metadata files in the target package.
3. A method of reading and parsing an electronic archive long term storage package as recited in claim 2, further comprising performing data cleansing on the obtained intra-package path information of the archive metadata file, the data cleansing comprising: and judging illegal characters on the obtained path information in the package of the file metadata file, and deleting the path information in the package of the file metadata file judged to comprise illegal characters.
4. The method for reading and parsing a long-term storage package of an electronic file according to claim 1, wherein the constructing the intra-package path information of the file metadata file based on the handover manifest information in the target package using a placeholder processing method comprises:
respectively acquiring metadata information of all electronic files in a target package according to the handover list information, and performing isolation processing on metadata of different groups based on a data structure of preset tree metadata, wherein the isolation processing comprises the following steps: maintaining a multi-tree data structure in the parsing process, when the node type needing to isolate the metadata is encountered, performing bifurcation processing on the multi-tree, and placing the metadata in the subsequent child nodes in different child nodes; when the placeholder is encountered, the source-tracing search is performed upwards until the base node is found.
5. The method for reading and analyzing a long-term storage package of an electronic file according to claim 1, comprising a plurality of different parsing models, each parsing model for parsing a different type of parsed file;
the reading and analyzing the target electronic file in the target package based on the path information in the package of the electronic file comprises the following steps:
and reading the target electronic file in the target package based on the package path information of the electronic file, and carrying out analysis processing according to the type matching corresponding analysis model of the target electronic file.
6. A system for reading and parsing a long-term storage package of an electronic file, comprising:
a path construction module configured to construct intra-packet path information of the archive metadata file using a placeholder processing method based on handover manifest information in the target encapsulation packet, wherein the handover manifest information is information describing an encapsulation packet archive entry within the target encapsulation packet;
the path perfecting module is configured to acquire the archive metadata file in the target encapsulation package based on the package path information of the archive metadata file, analyze the electronic file name contained in the corresponding archive metadata file, and construct the package path information of the electronic file by using a placeholder processing method based on the file name and the package path information;
and the reading and analyzing module is configured to read and analyze the target electronic file in the target packaging package based on the package path information of the electronic file.
7. An electronic device comprising at least one processor, at least one memory, and a data bus; wherein: the processor and the memory complete communication with each other through the data bus; the memory stores program instructions for execution by the processor, the processor invoking the program instructions to perform the method of any of claims 1-5.
8. A computer readable storage medium, on which a computer program is stored, which computer program, when being executed by a processor, implements the method according to any of claims 1-5.
CN202310424805.8A 2023-04-20 2023-04-20 Reading and analyzing method and system for electronic file long-term storage package Active CN116150105B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310424805.8A CN116150105B (en) 2023-04-20 2023-04-20 Reading and analyzing method and system for electronic file long-term storage package

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310424805.8A CN116150105B (en) 2023-04-20 2023-04-20 Reading and analyzing method and system for electronic file long-term storage package

Publications (2)

Publication Number Publication Date
CN116150105A true CN116150105A (en) 2023-05-23
CN116150105B CN116150105B (en) 2023-07-11

Family

ID=86339249

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310424805.8A Active CN116150105B (en) 2023-04-20 2023-04-20 Reading and analyzing method and system for electronic file long-term storage package

Country Status (1)

Country Link
CN (1) CN116150105B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106339362A (en) * 2016-08-31 2017-01-18 同方鼎欣科技股份有限公司 Large file encapsulation and analytical check method and system for archival information package
CN106650473A (en) * 2016-12-01 2017-05-10 中国工商银行股份有限公司 Method and system for verifying multiple document encapsulation packages, transmitting device and receiving device
CN112804097A (en) * 2021-01-04 2021-05-14 北京金山云网络技术有限公司 Private cloud deployment method and device and server
RU2759887C1 (en) * 2020-12-29 2021-11-18 федеральное государственное казенное военное образовательное учреждение высшего образования "Краснодарское высшее военное орденов Жукова и Октябрьской Революции Краснознаменное училище имени генерала армии С.М. Штеменко" Министерства обороны Российской Федерации Method for automatic classification of formalized electronic graphic and text documents in the electronic document circulation system with automatic formation of electronic cases
CN113822037A (en) * 2021-11-23 2021-12-21 深圳逻辑汇科技有限公司 Method, device, equipment and medium for inserting placeholder and generating data mapping table
CN115269515A (en) * 2022-09-22 2022-11-01 泰盈科技集团股份有限公司 Processing method for searching specified target document data

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106339362A (en) * 2016-08-31 2017-01-18 同方鼎欣科技股份有限公司 Large file encapsulation and analytical check method and system for archival information package
CN106650473A (en) * 2016-12-01 2017-05-10 中国工商银行股份有限公司 Method and system for verifying multiple document encapsulation packages, transmitting device and receiving device
RU2759887C1 (en) * 2020-12-29 2021-11-18 федеральное государственное казенное военное образовательное учреждение высшего образования "Краснодарское высшее военное орденов Жукова и Октябрьской Революции Краснознаменное училище имени генерала армии С.М. Штеменко" Министерства обороны Российской Федерации Method for automatic classification of formalized electronic graphic and text documents in the electronic document circulation system with automatic formation of electronic cases
CN112804097A (en) * 2021-01-04 2021-05-14 北京金山云网络技术有限公司 Private cloud deployment method and device and server
CN113822037A (en) * 2021-11-23 2021-12-21 深圳逻辑汇科技有限公司 Method, device, equipment and medium for inserting placeholder and generating data mapping table
CN115269515A (en) * 2022-09-22 2022-11-01 泰盈科技集团股份有限公司 Processing method for searching specified target document data

Also Published As

Publication number Publication date
CN116150105B (en) 2023-07-11

Similar Documents

Publication Publication Date Title
KR20150042877A (en) Managing record format information
CN112347123A (en) Data blood margin analysis method and device and server
CN111368097B (en) Knowledge graph extraction method and device
CN106960058A (en) A kind of structure of web page alteration detection method and system
CN110569371A (en) Knowledge graph construction method and device and storage equipment
CN108427580B (en) Configuration pair naming repetition detection method, storage medium and intelligent device
CN115237805A (en) Test case data preparation method and device
Levine et al. DEX: Digital evidence provenance supporting reproducibility and comparison
CN112069305B (en) Data screening method and device and electronic equipment
CN112835901A (en) File storage method and device, computer equipment and computer readable storage medium
CN116150105B (en) Reading and analyzing method and system for electronic file long-term storage package
KR100762712B1 (en) Method for transforming of electronic document based on mapping rule and system thereof
CN112328246A (en) Page component generation method and device, computer equipment and storage medium
CN113033177A (en) Method and device for analyzing electronic medical record data
CN108090034B (en) Cluster-based uniform document code coding generation method and system
CN115796146A (en) File comparison method and device
CN109254774A (en) The management method and device of code in software development system
CN107577476A (en) A kind of Android system source code difference analysis method, server and medium based on Module Division
CN113434472A (en) File generation method and device, server and storage medium
CN117573140B (en) Method, system and device for generating document by scanning codes
CN117370620B (en) Data blood margin construction method and device, terminal equipment and storage medium
CN117573060A (en) Audio resource processing method and device, electronic equipment and storage medium
CN110618809B (en) Front-end webpage input constraint extraction method and device
CN115345151A (en) System, method, device, medium and terminal for structured data parsing and management
CN114253548A (en) XML document processing method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: Room 101, 12th Floor, Building 4, Zone 1, No. 81 Beiqing Road, Haidian District, Beijing, 100000

Applicant after: Beijing Yunxuewei Technology Co.,Ltd.

Address before: 100000 Room 1511, Unit 1, Floor 12, Huizhi Building, No. 9 Xueqing Road, Haidian District, Beijing

Applicant before: Beijing Yunxuewei Technology Co.,Ltd.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant