CN114372104A - Electronic file metadata acquisition tool and method with good compatibility - Google Patents
Electronic file metadata acquisition tool and method with good compatibility Download PDFInfo
- Publication number
- CN114372104A CN114372104A CN202210021367.6A CN202210021367A CN114372104A CN 114372104 A CN114372104 A CN 114372104A CN 202210021367 A CN202210021367 A CN 202210021367A CN 114372104 A CN114372104 A CN 114372104A
- Authority
- CN
- China
- Prior art keywords
- data
- metadata
- electronic file
- files
- processing center
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 31
- 238000012545 processing Methods 0.000 claims abstract description 6
- 238000013075 data extraction Methods 0.000 claims description 7
- 238000006243 chemical reaction Methods 0.000 claims description 4
- 230000001960 triggered effect Effects 0.000 claims description 4
- 238000009960 carding Methods 0.000 claims 1
- 230000008569 process Effects 0.000 description 8
- 238000007726 management method Methods 0.000 description 7
- 238000012986 modification Methods 0.000 description 4
- 230000004048 modification Effects 0.000 description 4
- 238000000605 extraction Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 230000008520 organization Effects 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000012552 review Methods 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 238000012550 audit Methods 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 238000013439 planning Methods 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000007670 refining Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/258—Data format conversion from or to a database
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Quality & Reliability (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
技术领域technical field
本发明涉及数据采集技术领域,具体为一种兼容性好的电子文件元数据采集工具及方法。The invention relates to the technical field of data collection, in particular to an electronic file metadata collection tool and method with good compatibility.
背景技术Background technique
元数据是描述文件背景、内容、结构及其整个管理过程的数据。不同于传统意义上的著录,元数据内涵更丰富,功能更全面,要求更严格,不可能由档案管理人员在文件归档后进行著录,更不可能由形成机构文件管理人员或业务人员手工录入。元数据需要全程规划,需要嵌入系统,需要实时自动采集,需要真实、动态地再现电子文件管理的背景信息及过程信息。实现元数据自动采集,是元数据自身管理的要求,也是形成机构实际业务的需求。Metadata is data that describes the context, content, structure of a file and its entire management process. Different from description in the traditional sense, metadata has richer connotations, more comprehensive functions, and stricter requirements. It is impossible for archives managers to record after the documents are filed, and it is even more impossible for file managers or business personnel to manually enter them. Metadata needs full planning, needs to be embedded in the system, needs to be automatically collected in real time, and needs to truly and dynamically reproduce the background information and process information of electronic file management. Achieving automatic collection of metadata is a requirement for the management of metadata itself, as well as a requirement for forming the actual business of an organization.
在数据共享平台中,元数据的准确性会直接影响用户对其所需数据的定位速度,而让用户快速定位到所需数据是数据共享平台最基本的服务要求,因此对元数据的准确性要求较高。传统的元数据提取方法多为自下而上的方法,一般从数据(譬如公布于网上的大量科技文献)所在的网页入手,通过对数据进行语法分析、提取、汇总、统计、挖掘或机器学习,形成最终的元数据。此类方法无法保证最终所提取元数据的准确性,且所获得的元数据格式不能被数据读取器兼容。In the data sharing platform, the accuracy of metadata will directly affect the user's positioning speed of the data they need, and allowing users to quickly locate the required data is the most basic service requirement of the data sharing platform. Higher requirements. Most of the traditional metadata extraction methods are bottom-up methods, generally starting from the webpage where the data (such as a large number of scientific and technological documents published on the Internet) is located, and performing grammatical analysis, extraction, summary, statistics, mining or machine learning on the data. , forming the final metadata. Such methods cannot guarantee the accuracy of the final extracted metadata, and the format of the obtained metadata cannot be compatible with data readers.
发明内容SUMMARY OF THE INVENTION
针对现有技术的不足,本发明提供了一种兼容性好的电子文件元数据采集工具及方法,具备兼容性好的优点。In view of the deficiencies of the prior art, the present invention provides an electronic file metadata collection tool and method with good compatibility, which has the advantage of good compatibility.
技术方案Technical solutions
为实现上述目的,本发明提供如下技术方案:一种兼容性好的电子文件元数据采集工具,包括原始电子文件、单片机、数据处理中心和元数据库,所述单片机的输出端与数据处理中心的输入端电性连接,所述数据处理中心的输出端与元数据库的输入端电性连接。In order to achieve the above purpose, the present invention provides the following technical solutions: a kind of electronic file metadata collection tool with good compatibility, including original electronic file, single-chip computer, data processing center and metadata database, the output end of the single-chip computer and the data processing center's output end. The input terminal is electrically connected, and the output terminal of the data processing center is electrically connected to the input terminal of the metadata database.
优选的,所述数据处理中心的内部设置有编码识别器,所述数据处理中心与人工PC端连接,所述数据处理中心的内部设置有数据自动采集器,所述数据自动采集器根据相应的文件读取规则对电子文件中的数据信息进行读取。Preferably, a coding identifier is arranged inside the data processing center, the data processing center is connected to a manual PC terminal, and an automatic data collector is arranged inside the data processing center, and the automatic data collector is based on the corresponding The file reading rule reads the data information in the electronic file.
优选的,所述元数据库的内部设置有数据处理器,所述元数据库与管理员的PC端连接,所述元数据库的输出端连接有新数据库。Preferably, a data processor is provided inside the metadata database, the metadata database is connected to the PC terminal of the administrator, and the output terminal of the metadata database is connected to a new database.
优选的,所述新数据库的输出端连接有OA服务器,所述OA服务器并联有EDS加密服务器,所述新数据库通过4G信号与互联网信号连接。Preferably, the output end of the new database is connected with an OA server, the OA server is connected in parallel with an EDS encryption server, and the new database is connected with an Internet signal through a 4G signal.
本发明要解决的另一技术问题是提供一种兼容性好的电子文件元数据采集方法,包括以下步骤:Another technical problem to be solved by the present invention is to provide a compatible electronic file metadata collection method, comprising the following steps:
S1:了解原始电子文件的格式和属性,利用单片机的数据处理能力对不同格式的文件分为可识别源文件和不可识别源文件两类,单片机将两类文件传输给数据处理中心。S1: Understand the format and attributes of the original electronic file, and use the data processing capability of the microcontroller to classify files of different formats into two types: identifiable source files and unrecognizable source files. The single-chip microcomputer transmits the two types of files to the data processing center.
S2:对于可识别源文件的数据首先获取目标数据的布局结构,之后根据公认确定的文件读取规则对目标数据进行定位读取。S2: For the data of the identifiable source file, first obtain the layout structure of the target data, and then locate and read the target data according to the generally recognized and determined file reading rules.
S3:对于不可识别源文件的文件,根据兼容性分为兼容性较高和较差的两种源文件,对兼容性较高的源文件,需要在人工PC端安装相应的软件或硬件,之后建立相应的数据提取规则对目标数据进行提取,对于兼容性较差的源文件,需要触发人工介入,由工程师编写相应的转换工具进行读取。S3: For files with unrecognized source files, they are divided into two types of source files with high compatibility and poor compatibility according to compatibility. Establish corresponding data extraction rules to extract target data. For source files with poor compatibility, manual intervention needs to be triggered, and engineers can write corresponding conversion tools to read them.
S4:数据处理中心将S2和S3中读取的数据,以通用的格式进行保存,再传输到元数据库中进行储存。S4: The data processing center saves the data read in S2 and S3 in a common format, and then transmits it to the metadata database for storage.
S5:元数据库中的数据处理器对数据进行整理,首先将梳理分为准确数据、冗余数据和错误数据三种,对于准确数据将直接接输入到新数据库中,对于冗余数据数据处理器对其进行重置标定,然后将冗余数据中有标定的数据筛出,将剩余的数据输入到新数据库中,对于错误数据,则需管理员介入后,确定错误数据中的正确项与错误项,然后由管理员对错误数据进行修改,再输入到新数据库中。S5: The data processor in the metadata database sorts the data. First, the sorting is divided into three types: accurate data, redundant data and incorrect data. For accurate data, it will be directly input into the new database. For redundant data, the data processor Reset and calibrate it, then filter out the calibrated data in the redundant data, and input the remaining data into the new database. For wrong data, the administrator needs to intervene to determine the correct items and errors in the wrong data. Items are then modified by the administrator before they are entered into the new database.
S6:EDS加密服务器对新数据中的数据进行加密保护,企业中的部门可以从OA服务器对新数据库中的所有数据进行加密下载,新数据库中的所有数据均可在客户端通过互联网进行加密下载。S6: The EDS encryption server encrypts and protects the data in the new data. Departments in the enterprise can encrypt and download all data in the new database from the OA server. All data in the new database can be encrypted and downloaded on the client through the Internet .
有益效果beneficial effect
与现有技术相比,本发明提供了一种兼容性好的电子文件元数据采集工具及方法,具备以下有益效果:Compared with the prior art, the present invention provides an electronic file metadata collection tool and method with good compatibility, which has the following beneficial effects:
1、该兼容性好的电子文件元数据采集工具及方法,通过对所有电子文件进行分类处理,对兼容性较高的源文件,需要在人工PC端安装相应的软件或硬件,之后建立相应的数据提取规则对目标数据进行提取,对于兼容性较差的源文件,需要触发人工介入,由工程师编写相应的转换工具进行读取,以通用的格式进行保存,再传输到元数据库中进行储存,提高了所有元数据的兼容性,方便进行读取。1. The electronic file metadata collection tool and method with good compatibility can classify and process all electronic files. For source files with high compatibility, it is necessary to install corresponding software or hardware on the artificial PC side, and then establish corresponding software or hardware. The data extraction rules extract the target data. For the source files with poor compatibility, manual intervention needs to be triggered. The engineer writes the corresponding conversion tool to read it, save it in a common format, and then transfer it to the metadata database for storage. Improved compatibility of all metadata for easier reading.
2、该兼容性好的电子文件元数据采集工具及方法,通过数据处理器对数据进行整理,将梳理分为准确数据、冗余数据和错误数据三种,对于准确数据将直接接输入到新数据库中,对于冗余数据数据处理器对其进行重置标定,然后将冗余数据中有标定的数据筛出,将剩余的数据输入到新数据库中,对于错误数据,则需管理员介入后,确定错误数据中的正确项与错误项,然后由管理员对错误数据进行修改,再输入到新数据库中,有效的提高了所有元数据的准确性。2. The compatible electronic file metadata collection tool and method use the data processor to sort out the data, and divide the sorting into three types: accurate data, redundant data and incorrect data. In the database, the data processor resets and calibrates redundant data, then filters out the calibrated data in the redundant data, and enters the remaining data into the new database. For wrong data, the administrator needs to intervene. , to determine the correct item and the wrong item in the wrong data, and then the administrator will modify the wrong data, and then input it into the new database, which effectively improves the accuracy of all metadata.
附图说明Description of drawings
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to illustrate the technical solutions in the embodiments of the present invention more clearly, the following briefly introduces the accompanying drawings used in the description of the embodiments. Obviously, the accompanying drawings in the following description are only some embodiments of the present invention. For those of ordinary skill in the art, other drawings can also be obtained from these drawings without creative labor.
图1为本发明的元数据采集方法流程图;Fig. 1 is the flow chart of the metadata collection method of the present invention;
图2为本发明的提取规则建立流程图;Fig. 2 is the extraction rule establishment flow chart of the present invention;
图3为本发明的元数据库内数据整理流程图;Fig. 3 is the data arrangement flow chart in the metadata database of the present invention;
图4为本发明的数据加密分析流程图。Fig. 4 is the data encryption analysis flow chart of the present invention.
具体实施方式Detailed ways
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only a part of the embodiments of the present invention, but not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.
在本发明中,除非另有明确的规定和限定,术语“安装”、“相连”、“连接”、“固定”等术语应做广义理解,例如,可以是固定连接,也可以是可拆卸连接,或成一体;可以是机械连接,也可以是电连接;可以是直接相连,也可以通过中间媒介间接相连,可以是两个元件内部的连通或两个元件的相互作用关系。对于本领域的普通技术人员而言,可以根据具体情况理解上述术语在本发明中的具体含义。In the present invention, unless otherwise expressly specified and limited, the terms "installed", "connected", "connected", "fixed" and other terms should be understood in a broad sense, for example, it may be a fixed connection or a detachable connection , or integrated; it can be a mechanical connection or an electrical connection; it can be a direct connection or an indirect connection through an intermediate medium, and it can be the internal connection of the two elements or the interaction relationship between the two elements. For those of ordinary skill in the art, the specific meanings of the above terms in the present invention can be understood according to specific situations.
请参阅图1-4,本发明提供:一种兼容性好的电子文件元数据采集工具包括原始电子文件、单片机、数据处理中心和元数据库,单片机的输出端与数据处理中心的输入端电性连接,数据处理中心的输出端与元数据库的输入端电性连接,数据处理中心的内部设置有编码识别器,数据处理中心与人工PC端连接,数据处理中心的内部设置有数据自动采集器,数据自动采集器根据相应的文件读取规则对电子文件中的数据信息进行读取,元数据库的内部设置有数据处理器,元数据库与管理员的PC端连接,元数据库的输出端连接有新数据库,新数据库的输出端连接有OA服务器,OA服务器并联有EDS加密服务器,新数据库通过4G信号与互联网信号连接,对于安装加密以后的环境,凡是需要外出办公的笔记本电脑可以设置离线授权,控制该笔记本电脑在脱离环境多长时间内可以使用,超过时间,文件打开乱码,无法使用,凡是安装加密以后,能够全面防止企业信息泄密,对其产生的敏感数据泄露管控、核心数据加密防护,同时能够根据企业的实际需求,对企业文件加密以后,对文件进行备份,其备份的文件也可以根据企业的需求是备份为明文或者密文。Please refer to Figures 1-4, the present invention provides: a compatible electronic file metadata collection tool includes original electronic files, a single-chip microcomputer, a data processing center and a metadata database. The output terminal of the single-chip microcomputer and the input terminal of the data processing center are electrically connected. Connection, the output end of the data processing center is electrically connected with the input end of the metadata database, a coding identifier is arranged inside the data processing center, the data processing center is connected with the artificial PC end, and an automatic data collector is arranged inside the data processing center. The automatic data collector reads the data information in the electronic file according to the corresponding file reading rules. A data processor is set inside the metadata database, the metadata database is connected with the administrator's PC, and the output terminal of the metadata database is connected with a new data processor. Database, the output end of the new database is connected to the OA server, the OA server is connected to the EDS encryption server in parallel, and the new database is connected to the Internet signal through 4G signals. How long can the laptop be used out of the environment, if the time is exceeded, the files will be opened with garbled characters and cannot be used. After installing encryption, it can comprehensively prevent the leakage of enterprise information, control the leakage of sensitive data generated by it, and encrypt and protect the core data. According to the actual needs of the enterprise, after encrypting the enterprise files, the files can be backed up, and the backed up files can also be backed up as plaintext or ciphertext according to the needs of the enterprise.
一种兼容性好的电子文件元数据采集方法,包括以下步骤:A method for collecting metadata of electronic files with good compatibility, comprising the following steps:
S1:了解原始电子文件的格式和属性,利用单片机的数据处理能力对不同格式的文件分为可识别源文件和不可识别源文件两类,单片机将两类文件传输给数据处理中心。S1: Understand the format and attributes of the original electronic file, and use the data processing capability of the microcontroller to classify files of different formats into two types: identifiable source files and unrecognizable source files. The single-chip microcomputer transmits the two types of files to the data processing center.
S2:对于可识别源文件的数据首先获取目标数据的布局结构,之后根据公认确定的文件读取规则对目标数据进行定位读取。S2: For the data of the identifiable source file, first obtain the layout structure of the target data, and then locate and read the target data according to the generally recognized and determined file reading rules.
S3:对于不可识别源文件的文件,根据兼容性分为兼容性较高和较差的两种源文件,对兼容性较高的源文件,需要在人工PC端安装相应的软件或硬件,之后建立相应的数据提取规则对目标数据进行提取,对于兼容性较差的源文件,需要触发人工介入,由工程师编写相应的转换工具进行读取。S3: For files with unrecognized source files, they are divided into two types of source files with high compatibility and poor compatibility according to compatibility. Establish corresponding data extraction rules to extract target data. For source files with poor compatibility, manual intervention needs to be triggered, and engineers can write corresponding conversion tools to read them.
S4:数据处理中心将S2和S3中读取的数据,以通用的格式进行保存,再传输到元数据库中进行储存。S4: The data processing center saves the data read in S2 and S3 in a common format, and then transmits it to the metadata database for storage.
S5:元数据库中的数据处理器对数据进行整理,首先将梳理分为准确数据、冗余数据和错误数据三种,对于准确数据将直接接输入到新数据库中,对于冗余数据数据处理器对其进行重置标定,然后将冗余数据中有标定的数据筛出,将剩余的数据输入到新数据库中,对于错误数据,则需管理员介入后,确定错误数据中的正确项与错误项,然后由管理员对错误数据进行修改,再输入到新数据库中。S5: The data processor in the metadata database sorts the data. First, the sorting is divided into three types: accurate data, redundant data and incorrect data. For accurate data, it will be directly input into the new database. For redundant data, the data processor Reset and calibrate it, then filter out the calibrated data in the redundant data, and input the remaining data into the new database. For wrong data, the administrator needs to intervene to determine the correct items and errors in the wrong data. Items are then modified by the administrator before they are entered into the new database.
S6:EDS加密服务器对新数据中的数据进行加密保护,企业中的部门可以从OA服务器对新数据库中的所有数据进行加密下载,新数据库中的所有数据均可在客户端通过互联网进行加密下载。S6: The EDS encryption server encrypts and protects the data in the new data. Departments in the enterprise can encrypt and download all data in the new database from the OA server. All data in the new database can be encrypted and downloaded on the client through the Internet .
S2和S3中的数据提取规则基于档案管理的基本原则和基于电子文件的管理规则,具体包括来源原则,有机关联原则和前端控制原则;S5中的冗余数据可能含有数据项指标类似但含义不同的数据,在标定冗余数据时,需要在原始电子文件中确定,哪些数据是基础数据,哪些是摘抄来的数据,对于后者可标定为冗余数据。The data extraction rules in S2 and S3 are based on the basic principles of file management and the management rules based on electronic files, including the principle of source, the principle of organic association and the principle of front-end control; the redundant data in S5 may contain data items with similar indicators but different meanings When calibrating redundant data, it is necessary to determine in the original electronic file, which data are basic data and which are excerpted data, and the latter can be calibrated as redundant data.
元数据采集虽然通过对电子文件信息加以采集、提炼、分析和组织,揭示文件、档案的内容及其产生规律,但是仍然以尊重档案的本质属性和规律为前提,在采集时注重体现电子文件来源,使机构中同一来源的电子文件通过元数据采集得到集中反映,使元数据与档案的来源相联系,以此通过元数据揭示同一来源的档案、文件之间的各种联系,为档案、文件的理解与利用提供来源方面的背景信息;有机联系原则也是档案管理的基本原则,是指系统中文件及组成系统的诸要素之间需保持时空上的相互联系。由于电子文件是以二进制代码的形式分散存在于计算机之中,因此保持文件之间的有机联系显得尤为重要,而要保持这种有机联系,必须依赖于元数据;就元数据采集来说,在已经建立了电子文件管理系统的机构,电子文件在系统中生成、运转,电子文件元数据采集的前端“超前”至系统的设计阶段,前端控制的形式也部分转移到系统功能的设计之中,即尽可能地把文件生命周期各个阶段的元数据需求设计在系统之中,以功能合理的OA系统作为管好电子文件的先决条件。Although metadata collection reveals the content of documents and archives and their production rules by collecting, refining, analyzing and organizing electronic file information, it is still based on the premise of respecting the essential attributes and laws of archives, and it pays attention to reflect the source of electronic files when collecting. , so that the electronic documents of the same source in the organization can be reflected in a centralized manner through the collection of metadata, so that the metadata is related to the source of the archives, so as to reveal the various connections between the archives and documents of the same source through the metadata, which is the basis for the archives and documents. The understanding and utilization of the system provide background information on the source; the principle of organic connection is also the basic principle of archives management, which means that the documents in the system and the elements that make up the system need to maintain the relationship in time and space. Since electronic files are scattered in the computer in the form of binary codes, it is particularly important to maintain the organic connection between files, and to maintain this organic connection, metadata must be relied on; as far as metadata collection is concerned, in the The organization of the electronic document management system has been established. Electronic documents are generated and operated in the system. The front-end of electronic document metadata collection is "advanced" to the design stage of the system, and the form of front-end control is also partially transferred to the design of system functions. That is, the metadata requirements of each stage of the file life cycle are designed into the system as much as possible, and the OA system with reasonable functions is used as a prerequisite for managing electronic files.
了解原始数据属性及对应的指标的确切含义。这是采集原始数据的基础。一些数据指标经历了不断调整的过程,因此,必须首先了解原始数据的属性、结构、准确含义、包含的范围以及前后时间阶段的调整关系,确定所需要的数据项和数据提取原则。Learn what the raw data attributes and corresponding metrics mean exactly. This is the basis for collecting raw data. Some data indicators have undergone a process of continuous adjustment. Therefore, it is necessary to first understand the properties, structure, exact meaning, scope of inclusion, and the adjustment relationship between the previous and subsequent time periods of the original data, and determine the required data items and data extraction principles.
数据整理是数据预处理过程中最花费时间,但也是最为关键的步骤,一般情况下,获取的原始数据都会有各类问题或缺陷,在下一步处理之前必须进行整理,数据冗余表现为在一个时间段或一个数据序列内,出现指标含义相同、数据相同的数据项,或是指标名称不同但含义相同、数据相同的数据项,但是要特别注意的是,冗余数据中可能含有数据项指标类似但含义不同的数据,在标定冗余数据时,一般需要在一套报表或原始数据库中确定,哪些数据是基础数据,哪些是摘抄来的数据,对于后者可标定为冗余数据。Data sorting is the most time-consuming but also the most critical step in the data preprocessing process. In general, the acquired raw data will have various problems or defects, which must be sorted out before the next processing. Data redundancy is manifested in a In a time period or a data series, there are data items with the same indicator meaning and the same data, or data items with different indicator names but the same meaning and the same data, but it should be noted that redundant data may contain data item indicators For similar data with different meanings, when calibrating redundant data, it is generally necessary to determine in a set of reports or original databases, which data are basic data and which are excerpted data, and the latter can be calibrated as redundant data.
数据错误,对于此类错误,关键是要找出平衡关系中错误的数据项加以修改,一般情况下,以一套报表中的其他报表或同时期的其他数据作参考,首先确定正确的数据项和错误数据项的位置,例如确定是合计数据错误还是分项数据错误,然后,通过倒推的方法,确定数据平衡关系中错误数据应有的值加以改正,实际操作中,这种做法要慎之又慎,每一步都要留有记录供随时回到上一步状态,以免引起更大错误出现,因此需要管理员介入。Data errors. For such errors, the key is to find out the wrong data items in the balance relationship and modify them. Generally, with reference to other reports in a set of reports or other data in the same period, first determine the correct data items and the position of the wrong data item, for example, determine whether the total data is wrong or the sub-item data is wrong, and then, through the method of backward calculation, determine the value of the wrong data in the data balance relationship and correct it. In actual operation, this approach should be used with caution Also, be careful, keep a record for each step for returning to the previous state at any time, so as not to cause larger errors, so administrators are required to intervene.
软件环境中会包含有大量的逻辑审核公式,如原有的软件环境仍可重建,电子类数据缺失补充或错误改正应在原系统中进行,这样可以有效地减少工作量并提高数据准确性,对于纸质类数据,也可借助计算机电子表格类软件,在其中建立对应逻辑审核关系,将纸质数据输入到计算机中进行审核、修改,这样可大大提高效率,如果数据有误,整理出来的档案数据便失去了使用价值或引起负面作用,对删除、修改、估算的数据必须作备注说明,建立数据修改档案以备查用。The software environment will contain a large number of logical audit formulas. For example, the original software environment can still be reconstructed, and the missing supplement or error correction of electronic data should be carried out in the original system, which can effectively reduce the workload and improve the data accuracy. For paper data, you can also use computer spreadsheet software to establish a corresponding logical review relationship, and input the paper data into the computer for review and modification, which can greatly improve efficiency. The data will lose its use value or cause negative effects. Remarks must be made for the deleted, modified and estimated data, and a data modification file must be established for future reference.
需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个......”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。It should be noted that, in this document, relational terms such as first and second are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply any relationship between these entities or operations. any such actual relationship or sequence exists. Moreover, the terms "comprising", "comprising" or any other variation thereof are intended to encompass a non-exclusive inclusion such that a process, method, article or device that includes a list of elements includes not only those elements, but also includes not explicitly listed or other elements inherent to such a process, method, article or apparatus. Without further limitation, an element defined by the phrase "comprising a..." does not preclude the presence of additional identical elements in a process, method, article or apparatus that includes the element.
尽管已经示出和描述了本发明的实施例,对于本领域的普通技术人员而言,可以理解在不脱离本发明的原理和精神的情况下可以对这些实施例进行多种变化、修改、替换和变型,本发明的范围由所附权利要求及其等同物限定。Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, and substitutions can be made in these embodiments without departing from the principle and spirit of the invention and modifications, the scope of the present invention is defined by the appended claims and their equivalents.
Claims (7)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210021367.6A CN114372104A (en) | 2022-01-10 | 2022-01-10 | Electronic file metadata acquisition tool and method with good compatibility |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210021367.6A CN114372104A (en) | 2022-01-10 | 2022-01-10 | Electronic file metadata acquisition tool and method with good compatibility |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114372104A true CN114372104A (en) | 2022-04-19 |
Family
ID=81187750
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210021367.6A Pending CN114372104A (en) | 2022-01-10 | 2022-01-10 | Electronic file metadata acquisition tool and method with good compatibility |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114372104A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117407428A (en) * | 2023-11-21 | 2024-01-16 | 杭州沃趣科技股份有限公司 | Data processing system for acquiring target configuration file of target database |
Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050256908A1 (en) * | 2004-05-14 | 2005-11-17 | Wanli Yang | Transportable database |
US20080235217A1 (en) * | 2007-03-16 | 2008-09-25 | Sharma Yugal K | System and method for creating, verifying and integrating metadata for audio/video files |
US20090089315A1 (en) * | 2007-09-28 | 2009-04-02 | Tractmanager, Inc. | System and method for associating metadata with electronic documents |
KR20100058445A (en) * | 2010-05-24 | 2010-06-03 | (주)위세아이텍 | Automatic extracting method of heterogeneous metadata by using rule-based technology and system thereof |
US20120310875A1 (en) * | 2011-06-03 | 2012-12-06 | Prashanth Prahlad | Method and system of generating a data lineage repository with lineage visibility, snapshot comparison and version control in a cloud-computing platform |
CN103399961A (en) * | 2013-08-23 | 2013-11-20 | 北京中科嘉和科技发展有限公司 | Electronic literature management system capable of supporting multiple formats |
CN105719454A (en) * | 2016-01-28 | 2016-06-29 | 无锡南理工科技发展有限公司 | Extensible ZigBee data transmission device and method |
CN108549659A (en) * | 2018-03-12 | 2018-09-18 | 中城泰信(苏州)科技发展股份有限公司 | A kind of data warehouse management system and management method |
CN110275861A (en) * | 2019-06-25 | 2019-09-24 | 北京明略软件系统有限公司 | Date storage method and device, storage medium, electronic device |
CN110347650A (en) * | 2019-07-16 | 2019-10-18 | 北京明略软件系统有限公司 | A kind of metadata acquisition method and device |
CN110737629A (en) * | 2019-08-30 | 2020-01-31 | 华迪计算机集团有限公司 | method and system for archiving electronic files |
CN111723218A (en) * | 2020-06-22 | 2020-09-29 | 程浩 | Data processing method and server for courseware with multi-source content |
CN113704337A (en) * | 2021-08-26 | 2021-11-26 | 上海德拓信息技术股份有限公司 | Metadata acquisition method and system based on dynamic loading of driver |
-
2022
- 2022-01-10 CN CN202210021367.6A patent/CN114372104A/en active Pending
Patent Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050256908A1 (en) * | 2004-05-14 | 2005-11-17 | Wanli Yang | Transportable database |
US20080235217A1 (en) * | 2007-03-16 | 2008-09-25 | Sharma Yugal K | System and method for creating, verifying and integrating metadata for audio/video files |
US20090089315A1 (en) * | 2007-09-28 | 2009-04-02 | Tractmanager, Inc. | System and method for associating metadata with electronic documents |
KR20100058445A (en) * | 2010-05-24 | 2010-06-03 | (주)위세아이텍 | Automatic extracting method of heterogeneous metadata by using rule-based technology and system thereof |
US20120310875A1 (en) * | 2011-06-03 | 2012-12-06 | Prashanth Prahlad | Method and system of generating a data lineage repository with lineage visibility, snapshot comparison and version control in a cloud-computing platform |
CN103399961A (en) * | 2013-08-23 | 2013-11-20 | 北京中科嘉和科技发展有限公司 | Electronic literature management system capable of supporting multiple formats |
CN105719454A (en) * | 2016-01-28 | 2016-06-29 | 无锡南理工科技发展有限公司 | Extensible ZigBee data transmission device and method |
CN108549659A (en) * | 2018-03-12 | 2018-09-18 | 中城泰信(苏州)科技发展股份有限公司 | A kind of data warehouse management system and management method |
CN110275861A (en) * | 2019-06-25 | 2019-09-24 | 北京明略软件系统有限公司 | Date storage method and device, storage medium, electronic device |
CN110347650A (en) * | 2019-07-16 | 2019-10-18 | 北京明略软件系统有限公司 | A kind of metadata acquisition method and device |
CN110737629A (en) * | 2019-08-30 | 2020-01-31 | 华迪计算机集团有限公司 | method and system for archiving electronic files |
CN111723218A (en) * | 2020-06-22 | 2020-09-29 | 程浩 | Data processing method and server for courseware with multi-source content |
CN113704337A (en) * | 2021-08-26 | 2021-11-26 | 上海德拓信息技术股份有限公司 | Metadata acquisition method and system based on dynamic loading of driver |
Non-Patent Citations (3)
Title |
---|
奇点云: "浅谈元数据采集 | StartDT Tech Lab 10", Retrieved from the Internet <URL:https://zhuanlan.zhihu.com/p/408241690> * |
用友: "终于有人把元数据讲明白了", pages 1 - 4, Retrieved from the Internet <URL:https://www.51cto.com/article/706360.html> * |
用户1278550: "数据资产治理-元数据采集那点事", Retrieved from the Internet <URL:https://cloud.tencent.com/developer/article/1767731> * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117407428A (en) * | 2023-11-21 | 2024-01-16 | 杭州沃趣科技股份有限公司 | Data processing system for acquiring target configuration file of target database |
CN117407428B (en) * | 2023-11-21 | 2024-04-19 | 杭州沃趣科技股份有限公司 | Data processing system for acquiring target configuration file of target database |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109522746B (en) | A data processing method, electronic device and computer storage medium | |
CN104077665B (en) | Electricity power engineering analysis of prices data gathering system and method | |
US8156092B2 (en) | Document de-duplication and modification detection | |
US8832148B2 (en) | Enterprise evidence repository | |
US8566903B2 (en) | Enterprise evidence repository providing access control to collected artifacts | |
CN110147357A (en) | The multi-source data polymerization methods of sampling and system under a kind of environment based on big data | |
CN103761173A (en) | Log based computer system fault diagnosis method and device | |
CN108846102B (en) | Laboratory data management system and computer program for quality inspection center of medium-stored grains | |
CN105405069B (en) | Electricity purchase operation decision analysis and data processing method | |
CN113806170B (en) | Method, system, medium and terminal for automatically generating supervision log of engineering industry | |
CN111353005A (en) | Drug research and development reporting document management method and system | |
CN114780370A (en) | Data correction method and device based on log, electronic equipment and storage medium | |
CN117909392B (en) | Intelligent data asset inventory method and system | |
CN114880405A (en) | Data lake-based data processing method and system | |
CN114416638A (en) | Automatic electronic file filing method and system | |
CN112817958A (en) | Electric power planning data acquisition method and device and intelligent terminal | |
CN114372104A (en) | Electronic file metadata acquisition tool and method with good compatibility | |
CN114490882B (en) | Heterogeneous database data synchronization analysis method | |
CN116362443A (en) | A data governance method and device for an enterprise information platform | |
CN118071304B (en) | Engineering project data management method, device, equipment and readable storage medium | |
US20050204191A1 (en) | Systems and methods automatically classifying electronic data | |
CN114860932A (en) | Log information acquisition and monitoring method | |
CN108898355A (en) | Word reports visual configuration method in a kind of ocean Lims system | |
CN104216986A (en) | Device and method for improving data query efficiency through pre-operation according to data update period | |
CN106407396A (en) | A standardization management system and method for forest ecology observation station big data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20220419 |