CN114372104A - Electronic file metadata acquisition tool and method with good compatibility - Google Patents

Electronic file metadata acquisition tool and method with good compatibility Download PDF

Info

Publication number
CN114372104A
CN114372104A CN202210021367.6A CN202210021367A CN114372104A CN 114372104 A CN114372104 A CN 114372104A CN 202210021367 A CN202210021367 A CN 202210021367A CN 114372104 A CN114372104 A CN 114372104A
Authority
CN
China
Prior art keywords
data
metadata
electronic file
files
processing center
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210021367.6A
Other languages
Chinese (zh)
Inventor
朱伏权
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Jiuzhilian Information Technology Co ltd
Original Assignee
Suzhou Jiuzhilian Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Jiuzhilian Information Technology Co ltd filed Critical Suzhou Jiuzhilian Information Technology Co ltd
Priority to CN202210021367.6A priority Critical patent/CN114372104A/en
Publication of CN114372104A publication Critical patent/CN114372104A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/258Data format conversion from or to a database
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a metadata acquisition tool and a metadata acquisition method for an electronic file with good compatibility, which comprises an original electronic file, a single chip microcomputer, a data processing center and a metadata database, wherein a code recognizer is arranged in the data processing center, the data processing center is connected with a manual PC (personal computer) end, an automatic data acquisition device is arranged in the data processing center, and the automatic data acquisition device is used for reading data information in the electronic file according to a corresponding file reading rule. According to the electronic file metadata acquisition tool and method with good compatibility, files with different formats are classified by utilizing the data processing capacity of the single chip microcomputer, files with different types are processed in different modes, are stored in a universal format and are transmitted to the metadata database for storage, the compatibility of metadata is effectively improved, data sorting is carried out in the metadata database, and the accuracy of the metadata is effectively improved.

Description

一种兼容性好的电子文件元数据采集工具及方法A kind of electronic file metadata collection tool and method with good compatibility

技术领域technical field

本发明涉及数据采集技术领域,具体为一种兼容性好的电子文件元数据采集工具及方法。The invention relates to the technical field of data collection, in particular to an electronic file metadata collection tool and method with good compatibility.

背景技术Background technique

元数据是描述文件背景、内容、结构及其整个管理过程的数据。不同于传统意义上的著录,元数据内涵更丰富,功能更全面,要求更严格,不可能由档案管理人员在文件归档后进行著录,更不可能由形成机构文件管理人员或业务人员手工录入。元数据需要全程规划,需要嵌入系统,需要实时自动采集,需要真实、动态地再现电子文件管理的背景信息及过程信息。实现元数据自动采集,是元数据自身管理的要求,也是形成机构实际业务的需求。Metadata is data that describes the context, content, structure of a file and its entire management process. Different from description in the traditional sense, metadata has richer connotations, more comprehensive functions, and stricter requirements. It is impossible for archives managers to record after the documents are filed, and it is even more impossible for file managers or business personnel to manually enter them. Metadata needs full planning, needs to be embedded in the system, needs to be automatically collected in real time, and needs to truly and dynamically reproduce the background information and process information of electronic file management. Achieving automatic collection of metadata is a requirement for the management of metadata itself, as well as a requirement for forming the actual business of an organization.

在数据共享平台中,元数据的准确性会直接影响用户对其所需数据的定位速度,而让用户快速定位到所需数据是数据共享平台最基本的服务要求,因此对元数据的准确性要求较高。传统的元数据提取方法多为自下而上的方法,一般从数据(譬如公布于网上的大量科技文献)所在的网页入手,通过对数据进行语法分析、提取、汇总、统计、挖掘或机器学习,形成最终的元数据。此类方法无法保证最终所提取元数据的准确性,且所获得的元数据格式不能被数据读取器兼容。In the data sharing platform, the accuracy of metadata will directly affect the user's positioning speed of the data they need, and allowing users to quickly locate the required data is the most basic service requirement of the data sharing platform. Higher requirements. Most of the traditional metadata extraction methods are bottom-up methods, generally starting from the webpage where the data (such as a large number of scientific and technological documents published on the Internet) is located, and performing grammatical analysis, extraction, summary, statistics, mining or machine learning on the data. , forming the final metadata. Such methods cannot guarantee the accuracy of the final extracted metadata, and the format of the obtained metadata cannot be compatible with data readers.

发明内容SUMMARY OF THE INVENTION

针对现有技术的不足,本发明提供了一种兼容性好的电子文件元数据采集工具及方法,具备兼容性好的优点。In view of the deficiencies of the prior art, the present invention provides an electronic file metadata collection tool and method with good compatibility, which has the advantage of good compatibility.

技术方案Technical solutions

为实现上述目的,本发明提供如下技术方案:一种兼容性好的电子文件元数据采集工具,包括原始电子文件、单片机、数据处理中心和元数据库,所述单片机的输出端与数据处理中心的输入端电性连接,所述数据处理中心的输出端与元数据库的输入端电性连接。In order to achieve the above purpose, the present invention provides the following technical solutions: a kind of electronic file metadata collection tool with good compatibility, including original electronic file, single-chip computer, data processing center and metadata database, the output end of the single-chip computer and the data processing center's output end. The input terminal is electrically connected, and the output terminal of the data processing center is electrically connected to the input terminal of the metadata database.

优选的,所述数据处理中心的内部设置有编码识别器,所述数据处理中心与人工PC端连接,所述数据处理中心的内部设置有数据自动采集器,所述数据自动采集器根据相应的文件读取规则对电子文件中的数据信息进行读取。Preferably, a coding identifier is arranged inside the data processing center, the data processing center is connected to a manual PC terminal, and an automatic data collector is arranged inside the data processing center, and the automatic data collector is based on the corresponding The file reading rule reads the data information in the electronic file.

优选的,所述元数据库的内部设置有数据处理器,所述元数据库与管理员的PC端连接,所述元数据库的输出端连接有新数据库。Preferably, a data processor is provided inside the metadata database, the metadata database is connected to the PC terminal of the administrator, and the output terminal of the metadata database is connected to a new database.

优选的,所述新数据库的输出端连接有OA服务器,所述OA服务器并联有EDS加密服务器,所述新数据库通过4G信号与互联网信号连接。Preferably, the output end of the new database is connected with an OA server, the OA server is connected in parallel with an EDS encryption server, and the new database is connected with an Internet signal through a 4G signal.

本发明要解决的另一技术问题是提供一种兼容性好的电子文件元数据采集方法,包括以下步骤:Another technical problem to be solved by the present invention is to provide a compatible electronic file metadata collection method, comprising the following steps:

S1:了解原始电子文件的格式和属性,利用单片机的数据处理能力对不同格式的文件分为可识别源文件和不可识别源文件两类,单片机将两类文件传输给数据处理中心。S1: Understand the format and attributes of the original electronic file, and use the data processing capability of the microcontroller to classify files of different formats into two types: identifiable source files and unrecognizable source files. The single-chip microcomputer transmits the two types of files to the data processing center.

S2:对于可识别源文件的数据首先获取目标数据的布局结构,之后根据公认确定的文件读取规则对目标数据进行定位读取。S2: For the data of the identifiable source file, first obtain the layout structure of the target data, and then locate and read the target data according to the generally recognized and determined file reading rules.

S3:对于不可识别源文件的文件,根据兼容性分为兼容性较高和较差的两种源文件,对兼容性较高的源文件,需要在人工PC端安装相应的软件或硬件,之后建立相应的数据提取规则对目标数据进行提取,对于兼容性较差的源文件,需要触发人工介入,由工程师编写相应的转换工具进行读取。S3: For files with unrecognized source files, they are divided into two types of source files with high compatibility and poor compatibility according to compatibility. Establish corresponding data extraction rules to extract target data. For source files with poor compatibility, manual intervention needs to be triggered, and engineers can write corresponding conversion tools to read them.

S4:数据处理中心将S2和S3中读取的数据,以通用的格式进行保存,再传输到元数据库中进行储存。S4: The data processing center saves the data read in S2 and S3 in a common format, and then transmits it to the metadata database for storage.

S5:元数据库中的数据处理器对数据进行整理,首先将梳理分为准确数据、冗余数据和错误数据三种,对于准确数据将直接接输入到新数据库中,对于冗余数据数据处理器对其进行重置标定,然后将冗余数据中有标定的数据筛出,将剩余的数据输入到新数据库中,对于错误数据,则需管理员介入后,确定错误数据中的正确项与错误项,然后由管理员对错误数据进行修改,再输入到新数据库中。S5: The data processor in the metadata database sorts the data. First, the sorting is divided into three types: accurate data, redundant data and incorrect data. For accurate data, it will be directly input into the new database. For redundant data, the data processor Reset and calibrate it, then filter out the calibrated data in the redundant data, and input the remaining data into the new database. For wrong data, the administrator needs to intervene to determine the correct items and errors in the wrong data. Items are then modified by the administrator before they are entered into the new database.

S6:EDS加密服务器对新数据中的数据进行加密保护,企业中的部门可以从OA服务器对新数据库中的所有数据进行加密下载,新数据库中的所有数据均可在客户端通过互联网进行加密下载。S6: The EDS encryption server encrypts and protects the data in the new data. Departments in the enterprise can encrypt and download all data in the new database from the OA server. All data in the new database can be encrypted and downloaded on the client through the Internet .

有益效果beneficial effect

与现有技术相比,本发明提供了一种兼容性好的电子文件元数据采集工具及方法,具备以下有益效果:Compared with the prior art, the present invention provides an electronic file metadata collection tool and method with good compatibility, which has the following beneficial effects:

1、该兼容性好的电子文件元数据采集工具及方法,通过对所有电子文件进行分类处理,对兼容性较高的源文件,需要在人工PC端安装相应的软件或硬件,之后建立相应的数据提取规则对目标数据进行提取,对于兼容性较差的源文件,需要触发人工介入,由工程师编写相应的转换工具进行读取,以通用的格式进行保存,再传输到元数据库中进行储存,提高了所有元数据的兼容性,方便进行读取。1. The electronic file metadata collection tool and method with good compatibility can classify and process all electronic files. For source files with high compatibility, it is necessary to install corresponding software or hardware on the artificial PC side, and then establish corresponding software or hardware. The data extraction rules extract the target data. For the source files with poor compatibility, manual intervention needs to be triggered. The engineer writes the corresponding conversion tool to read it, save it in a common format, and then transfer it to the metadata database for storage. Improved compatibility of all metadata for easier reading.

2、该兼容性好的电子文件元数据采集工具及方法,通过数据处理器对数据进行整理,将梳理分为准确数据、冗余数据和错误数据三种,对于准确数据将直接接输入到新数据库中,对于冗余数据数据处理器对其进行重置标定,然后将冗余数据中有标定的数据筛出,将剩余的数据输入到新数据库中,对于错误数据,则需管理员介入后,确定错误数据中的正确项与错误项,然后由管理员对错误数据进行修改,再输入到新数据库中,有效的提高了所有元数据的准确性。2. The compatible electronic file metadata collection tool and method use the data processor to sort out the data, and divide the sorting into three types: accurate data, redundant data and incorrect data. In the database, the data processor resets and calibrates redundant data, then filters out the calibrated data in the redundant data, and enters the remaining data into the new database. For wrong data, the administrator needs to intervene. , to determine the correct item and the wrong item in the wrong data, and then the administrator will modify the wrong data, and then input it into the new database, which effectively improves the accuracy of all metadata.

附图说明Description of drawings

为了更清楚地说明本发明实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to illustrate the technical solutions in the embodiments of the present invention more clearly, the following briefly introduces the accompanying drawings used in the description of the embodiments. Obviously, the accompanying drawings in the following description are only some embodiments of the present invention. For those of ordinary skill in the art, other drawings can also be obtained from these drawings without creative labor.

图1为本发明的元数据采集方法流程图;Fig. 1 is the flow chart of the metadata collection method of the present invention;

图2为本发明的提取规则建立流程图;Fig. 2 is the extraction rule establishment flow chart of the present invention;

图3为本发明的元数据库内数据整理流程图;Fig. 3 is the data arrangement flow chart in the metadata database of the present invention;

图4为本发明的数据加密分析流程图。Fig. 4 is the data encryption analysis flow chart of the present invention.

具体实施方式Detailed ways

下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only a part of the embodiments of the present invention, but not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

在本发明中,除非另有明确的规定和限定,术语“安装”、“相连”、“连接”、“固定”等术语应做广义理解,例如,可以是固定连接,也可以是可拆卸连接,或成一体;可以是机械连接,也可以是电连接;可以是直接相连,也可以通过中间媒介间接相连,可以是两个元件内部的连通或两个元件的相互作用关系。对于本领域的普通技术人员而言,可以根据具体情况理解上述术语在本发明中的具体含义。In the present invention, unless otherwise expressly specified and limited, the terms "installed", "connected", "connected", "fixed" and other terms should be understood in a broad sense, for example, it may be a fixed connection or a detachable connection , or integrated; it can be a mechanical connection or an electrical connection; it can be a direct connection or an indirect connection through an intermediate medium, and it can be the internal connection of the two elements or the interaction relationship between the two elements. For those of ordinary skill in the art, the specific meanings of the above terms in the present invention can be understood according to specific situations.

请参阅图1-4,本发明提供:一种兼容性好的电子文件元数据采集工具包括原始电子文件、单片机、数据处理中心和元数据库,单片机的输出端与数据处理中心的输入端电性连接,数据处理中心的输出端与元数据库的输入端电性连接,数据处理中心的内部设置有编码识别器,数据处理中心与人工PC端连接,数据处理中心的内部设置有数据自动采集器,数据自动采集器根据相应的文件读取规则对电子文件中的数据信息进行读取,元数据库的内部设置有数据处理器,元数据库与管理员的PC端连接,元数据库的输出端连接有新数据库,新数据库的输出端连接有OA服务器,OA服务器并联有EDS加密服务器,新数据库通过4G信号与互联网信号连接,对于安装加密以后的环境,凡是需要外出办公的笔记本电脑可以设置离线授权,控制该笔记本电脑在脱离环境多长时间内可以使用,超过时间,文件打开乱码,无法使用,凡是安装加密以后,能够全面防止企业信息泄密,对其产生的敏感数据泄露管控、核心数据加密防护,同时能够根据企业的实际需求,对企业文件加密以后,对文件进行备份,其备份的文件也可以根据企业的需求是备份为明文或者密文。Please refer to Figures 1-4, the present invention provides: a compatible electronic file metadata collection tool includes original electronic files, a single-chip microcomputer, a data processing center and a metadata database. The output terminal of the single-chip microcomputer and the input terminal of the data processing center are electrically connected. Connection, the output end of the data processing center is electrically connected with the input end of the metadata database, a coding identifier is arranged inside the data processing center, the data processing center is connected with the artificial PC end, and an automatic data collector is arranged inside the data processing center. The automatic data collector reads the data information in the electronic file according to the corresponding file reading rules. A data processor is set inside the metadata database, the metadata database is connected with the administrator's PC, and the output terminal of the metadata database is connected with a new data processor. Database, the output end of the new database is connected to the OA server, the OA server is connected to the EDS encryption server in parallel, and the new database is connected to the Internet signal through 4G signals. How long can the laptop be used out of the environment, if the time is exceeded, the files will be opened with garbled characters and cannot be used. After installing encryption, it can comprehensively prevent the leakage of enterprise information, control the leakage of sensitive data generated by it, and encrypt and protect the core data. According to the actual needs of the enterprise, after encrypting the enterprise files, the files can be backed up, and the backed up files can also be backed up as plaintext or ciphertext according to the needs of the enterprise.

一种兼容性好的电子文件元数据采集方法,包括以下步骤:A method for collecting metadata of electronic files with good compatibility, comprising the following steps:

S1:了解原始电子文件的格式和属性,利用单片机的数据处理能力对不同格式的文件分为可识别源文件和不可识别源文件两类,单片机将两类文件传输给数据处理中心。S1: Understand the format and attributes of the original electronic file, and use the data processing capability of the microcontroller to classify files of different formats into two types: identifiable source files and unrecognizable source files. The single-chip microcomputer transmits the two types of files to the data processing center.

S2:对于可识别源文件的数据首先获取目标数据的布局结构,之后根据公认确定的文件读取规则对目标数据进行定位读取。S2: For the data of the identifiable source file, first obtain the layout structure of the target data, and then locate and read the target data according to the generally recognized and determined file reading rules.

S3:对于不可识别源文件的文件,根据兼容性分为兼容性较高和较差的两种源文件,对兼容性较高的源文件,需要在人工PC端安装相应的软件或硬件,之后建立相应的数据提取规则对目标数据进行提取,对于兼容性较差的源文件,需要触发人工介入,由工程师编写相应的转换工具进行读取。S3: For files with unrecognized source files, they are divided into two types of source files with high compatibility and poor compatibility according to compatibility. Establish corresponding data extraction rules to extract target data. For source files with poor compatibility, manual intervention needs to be triggered, and engineers can write corresponding conversion tools to read them.

S4:数据处理中心将S2和S3中读取的数据,以通用的格式进行保存,再传输到元数据库中进行储存。S4: The data processing center saves the data read in S2 and S3 in a common format, and then transmits it to the metadata database for storage.

S5:元数据库中的数据处理器对数据进行整理,首先将梳理分为准确数据、冗余数据和错误数据三种,对于准确数据将直接接输入到新数据库中,对于冗余数据数据处理器对其进行重置标定,然后将冗余数据中有标定的数据筛出,将剩余的数据输入到新数据库中,对于错误数据,则需管理员介入后,确定错误数据中的正确项与错误项,然后由管理员对错误数据进行修改,再输入到新数据库中。S5: The data processor in the metadata database sorts the data. First, the sorting is divided into three types: accurate data, redundant data and incorrect data. For accurate data, it will be directly input into the new database. For redundant data, the data processor Reset and calibrate it, then filter out the calibrated data in the redundant data, and input the remaining data into the new database. For wrong data, the administrator needs to intervene to determine the correct items and errors in the wrong data. Items are then modified by the administrator before they are entered into the new database.

S6:EDS加密服务器对新数据中的数据进行加密保护,企业中的部门可以从OA服务器对新数据库中的所有数据进行加密下载,新数据库中的所有数据均可在客户端通过互联网进行加密下载。S6: The EDS encryption server encrypts and protects the data in the new data. Departments in the enterprise can encrypt and download all data in the new database from the OA server. All data in the new database can be encrypted and downloaded on the client through the Internet .

S2和S3中的数据提取规则基于档案管理的基本原则和基于电子文件的管理规则,具体包括来源原则,有机关联原则和前端控制原则;S5中的冗余数据可能含有数据项指标类似但含义不同的数据,在标定冗余数据时,需要在原始电子文件中确定,哪些数据是基础数据,哪些是摘抄来的数据,对于后者可标定为冗余数据。The data extraction rules in S2 and S3 are based on the basic principles of file management and the management rules based on electronic files, including the principle of source, the principle of organic association and the principle of front-end control; the redundant data in S5 may contain data items with similar indicators but different meanings When calibrating redundant data, it is necessary to determine in the original electronic file, which data are basic data and which are excerpted data, and the latter can be calibrated as redundant data.

元数据采集虽然通过对电子文件信息加以采集、提炼、分析和组织,揭示文件、档案的内容及其产生规律,但是仍然以尊重档案的本质属性和规律为前提,在采集时注重体现电子文件来源,使机构中同一来源的电子文件通过元数据采集得到集中反映,使元数据与档案的来源相联系,以此通过元数据揭示同一来源的档案、文件之间的各种联系,为档案、文件的理解与利用提供来源方面的背景信息;有机联系原则也是档案管理的基本原则,是指系统中文件及组成系统的诸要素之间需保持时空上的相互联系。由于电子文件是以二进制代码的形式分散存在于计算机之中,因此保持文件之间的有机联系显得尤为重要,而要保持这种有机联系,必须依赖于元数据;就元数据采集来说,在已经建立了电子文件管理系统的机构,电子文件在系统中生成、运转,电子文件元数据采集的前端“超前”至系统的设计阶段,前端控制的形式也部分转移到系统功能的设计之中,即尽可能地把文件生命周期各个阶段的元数据需求设计在系统之中,以功能合理的OA系统作为管好电子文件的先决条件。Although metadata collection reveals the content of documents and archives and their production rules by collecting, refining, analyzing and organizing electronic file information, it is still based on the premise of respecting the essential attributes and laws of archives, and it pays attention to reflect the source of electronic files when collecting. , so that the electronic documents of the same source in the organization can be reflected in a centralized manner through the collection of metadata, so that the metadata is related to the source of the archives, so as to reveal the various connections between the archives and documents of the same source through the metadata, which is the basis for the archives and documents. The understanding and utilization of the system provide background information on the source; the principle of organic connection is also the basic principle of archives management, which means that the documents in the system and the elements that make up the system need to maintain the relationship in time and space. Since electronic files are scattered in the computer in the form of binary codes, it is particularly important to maintain the organic connection between files, and to maintain this organic connection, metadata must be relied on; as far as metadata collection is concerned, in the The organization of the electronic document management system has been established. Electronic documents are generated and operated in the system. The front-end of electronic document metadata collection is "advanced" to the design stage of the system, and the form of front-end control is also partially transferred to the design of system functions. That is, the metadata requirements of each stage of the file life cycle are designed into the system as much as possible, and the OA system with reasonable functions is used as a prerequisite for managing electronic files.

了解原始数据属性及对应的指标的确切含义。这是采集原始数据的基础。一些数据指标经历了不断调整的过程,因此,必须首先了解原始数据的属性、结构、准确含义、包含的范围以及前后时间阶段的调整关系,确定所需要的数据项和数据提取原则。Learn what the raw data attributes and corresponding metrics mean exactly. This is the basis for collecting raw data. Some data indicators have undergone a process of continuous adjustment. Therefore, it is necessary to first understand the properties, structure, exact meaning, scope of inclusion, and the adjustment relationship between the previous and subsequent time periods of the original data, and determine the required data items and data extraction principles.

数据整理是数据预处理过程中最花费时间,但也是最为关键的步骤,一般情况下,获取的原始数据都会有各类问题或缺陷,在下一步处理之前必须进行整理,数据冗余表现为在一个时间段或一个数据序列内,出现指标含义相同、数据相同的数据项,或是指标名称不同但含义相同、数据相同的数据项,但是要特别注意的是,冗余数据中可能含有数据项指标类似但含义不同的数据,在标定冗余数据时,一般需要在一套报表或原始数据库中确定,哪些数据是基础数据,哪些是摘抄来的数据,对于后者可标定为冗余数据。Data sorting is the most time-consuming but also the most critical step in the data preprocessing process. In general, the acquired raw data will have various problems or defects, which must be sorted out before the next processing. Data redundancy is manifested in a In a time period or a data series, there are data items with the same indicator meaning and the same data, or data items with different indicator names but the same meaning and the same data, but it should be noted that redundant data may contain data item indicators For similar data with different meanings, when calibrating redundant data, it is generally necessary to determine in a set of reports or original databases, which data are basic data and which are excerpted data, and the latter can be calibrated as redundant data.

数据错误,对于此类错误,关键是要找出平衡关系中错误的数据项加以修改,一般情况下,以一套报表中的其他报表或同时期的其他数据作参考,首先确定正确的数据项和错误数据项的位置,例如确定是合计数据错误还是分项数据错误,然后,通过倒推的方法,确定数据平衡关系中错误数据应有的值加以改正,实际操作中,这种做法要慎之又慎,每一步都要留有记录供随时回到上一步状态,以免引起更大错误出现,因此需要管理员介入。Data errors. For such errors, the key is to find out the wrong data items in the balance relationship and modify them. Generally, with reference to other reports in a set of reports or other data in the same period, first determine the correct data items and the position of the wrong data item, for example, determine whether the total data is wrong or the sub-item data is wrong, and then, through the method of backward calculation, determine the value of the wrong data in the data balance relationship and correct it. In actual operation, this approach should be used with caution Also, be careful, keep a record for each step for returning to the previous state at any time, so as not to cause larger errors, so administrators are required to intervene.

软件环境中会包含有大量的逻辑审核公式,如原有的软件环境仍可重建,电子类数据缺失补充或错误改正应在原系统中进行,这样可以有效地减少工作量并提高数据准确性,对于纸质类数据,也可借助计算机电子表格类软件,在其中建立对应逻辑审核关系,将纸质数据输入到计算机中进行审核、修改,这样可大大提高效率,如果数据有误,整理出来的档案数据便失去了使用价值或引起负面作用,对删除、修改、估算的数据必须作备注说明,建立数据修改档案以备查用。The software environment will contain a large number of logical audit formulas. For example, the original software environment can still be reconstructed, and the missing supplement or error correction of electronic data should be carried out in the original system, which can effectively reduce the workload and improve the data accuracy. For paper data, you can also use computer spreadsheet software to establish a corresponding logical review relationship, and input the paper data into the computer for review and modification, which can greatly improve efficiency. The data will lose its use value or cause negative effects. Remarks must be made for the deleted, modified and estimated data, and a data modification file must be established for future reference.

需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个......”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。It should be noted that, in this document, relational terms such as first and second are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply any relationship between these entities or operations. any such actual relationship or sequence exists. Moreover, the terms "comprising", "comprising" or any other variation thereof are intended to encompass a non-exclusive inclusion such that a process, method, article or device that includes a list of elements includes not only those elements, but also includes not explicitly listed or other elements inherent to such a process, method, article or apparatus. Without further limitation, an element defined by the phrase "comprising a..." does not preclude the presence of additional identical elements in a process, method, article or apparatus that includes the element.

尽管已经示出和描述了本发明的实施例,对于本领域的普通技术人员而言,可以理解在不脱离本发明的原理和精神的情况下可以对这些实施例进行多种变化、修改、替换和变型,本发明的范围由所附权利要求及其等同物限定。Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, and substitutions can be made in these embodiments without departing from the principle and spirit of the invention and modifications, the scope of the present invention is defined by the appended claims and their equivalents.

Claims (7)

1. The utility model provides a compatible good electronic file metadata acquisition instrument, includes original electronic file, singlechip, data processing center and metadatabase, its characterized in that: the output end of the single chip microcomputer is electrically connected with the input end of the data processing center, and the output end of the data processing center is electrically connected with the input end of the metadata database.
2. The tool of claim 1, wherein the tool comprises: the data processing system is characterized in that a code recognizer is arranged inside the data processing center, the data processing center is connected with the manual PC end, an automatic data collector is arranged inside the data processing center, and the automatic data collector reads data information in the electronic file according to corresponding file reading rules.
3. The tool of claim 1, wherein the tool comprises: the metadata database is internally provided with a data processor and is connected with a PC (personal computer) end of an administrator, and the output end of the metadata database is connected with a new database.
4. The tool of claim 1, wherein the tool comprises: the output end of the new database is connected with an OA server, the OA server is connected with an EDS encryption server in parallel, and the new database is in signal connection with the Internet through 4G signals.
5. A metadata collection method for an electronic file with good compatibility is characterized by comprising the following steps:
s1: knowing the format and the attribute of an original electronic file, dividing the files with different formats into an identifiable source file and a non-identifiable source file by using the data processing capacity of the single chip microcomputer, and transmitting the two types of files to a data processing center by the single chip microcomputer;
s2: firstly, acquiring a layout structure of target data for data of an identifiable source file, and then positioning and reading the target data according to a confirmed file reading rule;
s3: for the files of the unidentifiable source files, the source files are divided into two types of source files with higher compatibility and poorer compatibility according to compatibility, for the source files with higher compatibility, corresponding software or hardware needs to be installed at a manual PC end, then corresponding data extraction rules are established to extract target data, for the source files with poorer compatibility, manual intervention needs to be triggered, and an engineer writes a corresponding conversion tool to read the source files;
s4: the data processing center stores the data read in S2 and S3 in a universal format and transmits the data to the metadata database for storage;
s5: the data processor in the metadata base sorts the data, firstly, the carding is divided into accurate data, redundant data and error data, the accurate data is directly input into the new database, the redundant data processor resets and calibrates the data, then the calibrated data in the redundant data is screened out, the rest data is input into the new database, for the error data, after the intervention of an administrator, the correct item and the error item in the error data are determined, then the administrator modifies the error data, and then the error data are input into the new database;
s6: the EDS encryption server carries out encryption protection on the data in the new data, departments in enterprises can carry out encryption downloading on all the data in the new database from the OA server, and all the data in the new database can be encrypted and downloaded at a client through the Internet.
6. The method for collecting metadata of an electronic file according to claim 5, wherein the method comprises the following steps: the data extraction rules in S2 and S3 are based on the basic principles of archive management and the management rules based on electronic files, and specifically include a source principle, an organic association principle and a front-end control principle.
7. The method for collecting metadata of an electronic file according to claim 5, wherein the method comprises the following steps: the redundant data in S5 may contain data with similar data item indexes but different meanings, and when calibrating the redundant data, it needs to determine which data are basic data and which are extracted data in the original electronic file, and it can be calibrated as redundant data for the latter.
CN202210021367.6A 2022-01-10 2022-01-10 Electronic file metadata acquisition tool and method with good compatibility Pending CN114372104A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210021367.6A CN114372104A (en) 2022-01-10 2022-01-10 Electronic file metadata acquisition tool and method with good compatibility

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210021367.6A CN114372104A (en) 2022-01-10 2022-01-10 Electronic file metadata acquisition tool and method with good compatibility

Publications (1)

Publication Number Publication Date
CN114372104A true CN114372104A (en) 2022-04-19

Family

ID=81187750

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210021367.6A Pending CN114372104A (en) 2022-01-10 2022-01-10 Electronic file metadata acquisition tool and method with good compatibility

Country Status (1)

Country Link
CN (1) CN114372104A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117407428A (en) * 2023-11-21 2024-01-16 杭州沃趣科技股份有限公司 Data processing system for acquiring target configuration file of target database

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050256908A1 (en) * 2004-05-14 2005-11-17 Wanli Yang Transportable database
US20080235217A1 (en) * 2007-03-16 2008-09-25 Sharma Yugal K System and method for creating, verifying and integrating metadata for audio/video files
US20090089315A1 (en) * 2007-09-28 2009-04-02 Tractmanager, Inc. System and method for associating metadata with electronic documents
KR20100058445A (en) * 2010-05-24 2010-06-03 (주)위세아이텍 Automatic extracting method of heterogeneous metadata by using rule-based technology and system thereof
US20120310875A1 (en) * 2011-06-03 2012-12-06 Prashanth Prahlad Method and system of generating a data lineage repository with lineage visibility, snapshot comparison and version control in a cloud-computing platform
CN103399961A (en) * 2013-08-23 2013-11-20 北京中科嘉和科技发展有限公司 Electronic literature management system capable of supporting multiple formats
CN105719454A (en) * 2016-01-28 2016-06-29 无锡南理工科技发展有限公司 Extensible ZigBee data transmission device and method
CN108549659A (en) * 2018-03-12 2018-09-18 中城泰信(苏州)科技发展股份有限公司 A kind of data warehouse management system and management method
CN110275861A (en) * 2019-06-25 2019-09-24 北京明略软件系统有限公司 Date storage method and device, storage medium, electronic device
CN110347650A (en) * 2019-07-16 2019-10-18 北京明略软件系统有限公司 A kind of metadata acquisition method and device
CN110737629A (en) * 2019-08-30 2020-01-31 华迪计算机集团有限公司 method and system for archiving electronic files
CN111723218A (en) * 2020-06-22 2020-09-29 程浩 Data processing method and server for courseware with multi-source content
CN113704337A (en) * 2021-08-26 2021-11-26 上海德拓信息技术股份有限公司 Metadata acquisition method and system based on dynamic loading of driver

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050256908A1 (en) * 2004-05-14 2005-11-17 Wanli Yang Transportable database
US20080235217A1 (en) * 2007-03-16 2008-09-25 Sharma Yugal K System and method for creating, verifying and integrating metadata for audio/video files
US20090089315A1 (en) * 2007-09-28 2009-04-02 Tractmanager, Inc. System and method for associating metadata with electronic documents
KR20100058445A (en) * 2010-05-24 2010-06-03 (주)위세아이텍 Automatic extracting method of heterogeneous metadata by using rule-based technology and system thereof
US20120310875A1 (en) * 2011-06-03 2012-12-06 Prashanth Prahlad Method and system of generating a data lineage repository with lineage visibility, snapshot comparison and version control in a cloud-computing platform
CN103399961A (en) * 2013-08-23 2013-11-20 北京中科嘉和科技发展有限公司 Electronic literature management system capable of supporting multiple formats
CN105719454A (en) * 2016-01-28 2016-06-29 无锡南理工科技发展有限公司 Extensible ZigBee data transmission device and method
CN108549659A (en) * 2018-03-12 2018-09-18 中城泰信(苏州)科技发展股份有限公司 A kind of data warehouse management system and management method
CN110275861A (en) * 2019-06-25 2019-09-24 北京明略软件系统有限公司 Date storage method and device, storage medium, electronic device
CN110347650A (en) * 2019-07-16 2019-10-18 北京明略软件系统有限公司 A kind of metadata acquisition method and device
CN110737629A (en) * 2019-08-30 2020-01-31 华迪计算机集团有限公司 method and system for archiving electronic files
CN111723218A (en) * 2020-06-22 2020-09-29 程浩 Data processing method and server for courseware with multi-source content
CN113704337A (en) * 2021-08-26 2021-11-26 上海德拓信息技术股份有限公司 Metadata acquisition method and system based on dynamic loading of driver

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
奇点云: "浅谈元数据采集 | StartDT Tech Lab 10", Retrieved from the Internet <URL:https://zhuanlan.zhihu.com/p/408241690> *
用友: "终于有人把元数据讲明白了", pages 1 - 4, Retrieved from the Internet <URL:https://www.51cto.com/article/706360.html> *
用户1278550: "数据资产治理-元数据采集那点事", Retrieved from the Internet <URL:https://cloud.tencent.com/developer/article/1767731> *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117407428A (en) * 2023-11-21 2024-01-16 杭州沃趣科技股份有限公司 Data processing system for acquiring target configuration file of target database
CN117407428B (en) * 2023-11-21 2024-04-19 杭州沃趣科技股份有限公司 Data processing system for acquiring target configuration file of target database

Similar Documents

Publication Publication Date Title
CN109522746B (en) A data processing method, electronic device and computer storage medium
CN104077665B (en) Electricity power engineering analysis of prices data gathering system and method
US8156092B2 (en) Document de-duplication and modification detection
US8832148B2 (en) Enterprise evidence repository
US8566903B2 (en) Enterprise evidence repository providing access control to collected artifacts
CN110147357A (en) The multi-source data polymerization methods of sampling and system under a kind of environment based on big data
CN103761173A (en) Log based computer system fault diagnosis method and device
CN108846102B (en) Laboratory data management system and computer program for quality inspection center of medium-stored grains
CN105405069B (en) Electricity purchase operation decision analysis and data processing method
CN113806170B (en) Method, system, medium and terminal for automatically generating supervision log of engineering industry
CN111353005A (en) Drug research and development reporting document management method and system
CN114780370A (en) Data correction method and device based on log, electronic equipment and storage medium
CN117909392B (en) Intelligent data asset inventory method and system
CN114880405A (en) Data lake-based data processing method and system
CN114416638A (en) Automatic electronic file filing method and system
CN112817958A (en) Electric power planning data acquisition method and device and intelligent terminal
CN114372104A (en) Electronic file metadata acquisition tool and method with good compatibility
CN114490882B (en) Heterogeneous database data synchronization analysis method
CN116362443A (en) A data governance method and device for an enterprise information platform
CN118071304B (en) Engineering project data management method, device, equipment and readable storage medium
US20050204191A1 (en) Systems and methods automatically classifying electronic data
CN114860932A (en) Log information acquisition and monitoring method
CN108898355A (en) Word reports visual configuration method in a kind of ocean Lims system
CN104216986A (en) Device and method for improving data query efficiency through pre-operation according to data update period
CN106407396A (en) A standardization management system and method for forest ecology observation station big data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20220419