CN114036926A - Automatic precious metal material data file extraction system and method - Google Patents

Automatic precious metal material data file extraction system and method Download PDF

Info

Publication number
CN114036926A
CN114036926A CN202111247547.8A CN202111247547A CN114036926A CN 114036926 A CN114036926 A CN 114036926A CN 202111247547 A CN202111247547 A CN 202111247547A CN 114036926 A CN114036926 A CN 114036926A
Authority
CN
China
Prior art keywords
metal material
noble metal
data
data file
analysis
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202111247547.8A
Other languages
Chinese (zh)
Inventor
陈力
张爱敏
崔浩
陈家林
王建强
郭俊梅
王卓
王者
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Caizhi Technology Co ltd
Kunming Guiyan New Material Technology Co ltd
Original Assignee
Chengdu Caizhi Technology Co ltd
Kunming Guiyan New Material Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Caizhi Technology Co ltd, Kunming Guiyan New Material Technology Co ltd filed Critical Chengdu Caizhi Technology Co ltd
Priority to CN202111247547.8A priority Critical patent/CN114036926A/en
Publication of CN114036926A publication Critical patent/CN114036926A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/252Integrating or interfacing systems involving database management systems between a Database Management System and a front-end application
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/186Templates

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses an automatic extraction system and method for a noble metal material data file, which comprises an analysis plug-in module, an analysis template module and a file formatting module, wherein the analysis plug-in module comprises the following steps: searching an analysis plug-in capable of analyzing an electronic document data file generated in the experiment production of the noble metal industry, and analyzing the data file through the analysis plug-in; and (3) analyzing a template module: searching an analysis template capable of mapping the analyzed data to a noble metal material database, and analyzing the noble metal material data file through the analysis template; a file formatting module: and judging the format of the analyzed noble metal material data file, and extracting the noble metal material structured data of the data file format meeting the standard to obtain the noble metal material structured data. The method can be used for fully automatically extracting and archiving the documents with the known formats, improves the digitization efficiency and accuracy of the noble metal material data, and reduces the working difficulty and the investment cost of the input personnel.

Description

Automatic precious metal material data file extraction system and method
Technical Field
The invention belongs to the technical field of noble metal data material arrangement, and particularly relates to an automatic noble metal data file extraction system and method.
Background
In past precious metal industry experimental productions, the generated data is often recorded in electronic documents (e.g., Word, Excel, TXT, PDF). Each laboratory and the subject group have a set of data recording mode and a generated data format, and the laboratory or the subject group can be consulted at any time when needed. However, with the continuous progress of the modern computer technology, more and more industries choose to perform operations such as information extraction and storage analysis on data in electronic documents.
As the noble metal industry has a great number of design materials and a large index of material performance, the data format is complex and independent among different departments. In a large-scale precious metal enterprise, a plurality of laboratories, a plurality of task groups and a plurality of material data formats often exist at the same time, and how to simultaneously input dozens or even hundreds of data into a single-structure database during data file extraction becomes a huge problem.
Although some systems are also applied to the automatic extraction technology of the precious metal material data file at present, the automatic extraction of a single-format file is often emphasized, the universality is lacked, the automatic extraction technology developed by a certain laboratory may perform well on extracting the laboratory data, and when extracting data of another laboratory, the extraction errors are more.
Meanwhile, in the existing automatic extraction technology of precious metal material data files with good universality, the whole-process manual input interference is often needed, the automatic extraction is not realized, and extraction personnel must have certain professional knowledge in the field of precious metal materials. Therefore, a large amount of time of talents in the field of precious metals is delayed, and due to manual intervention, the extraction efficiency is low, and the error rate is obviously higher than that of machine extraction. The method has no effect on the data types which are possibly increased in the future, and an automatic extraction process must be redesigned, so that a large amount of time is wasted in the research and development of the user automatic extraction technology, scientific researchers who generate precious metal material data files feel the toe-in of the hand, and the freedom of data format design is limited.
Disclosure of Invention
Aiming at the defects of the prior art, the invention aims to provide a system and a method for automatically extracting a noble metal material data file, which solve the technical problems in the prior art.
The purpose of the invention can be realized by the following technical scheme:
an automatic extraction system of noble metal material data files comprises an analysis plug-in module, an analysis template module and a file formatting module,
the analysis plug-in module: searching an analysis plug-in capable of analyzing an electronic document data file generated in the experiment production of the noble metal industry, and analyzing the data file through the analysis plug-in;
the analysis template module: searching an analysis template capable of mapping the analyzed data to a noble metal material database, and analyzing the noble metal material data file through the analysis template;
the file formatting module: and judging the format of the analyzed noble metal material data file, and extracting the noble metal material structured data of the data file format meeting the standard to obtain the noble metal material structured data.
Furthermore, each plug-in the analysis plug-in is responsible for analyzing a class of noble metal material data files, and the analysis plug-in can do processing logic aiming at the noble metal material data.
Further, the processing logic mode of the analysis plug-in is as follows: reading the data format of the specific noble metal material, and performing recombination and simple calculation statistical operation on the noble metal performance data.
Furthermore, for an analysis plug-in module without an analysis plug-in capable of analyzing the data file, an analysis plug-in adapted to the file data is manually introduced.
Furthermore, each analysis template is responsible for generating a mapping relation between the analysis result and the data of the precious metal material database one by one, and when the analysis result has multiple mappings, a plurality of precious metal material data file analysis templates are created.
Furthermore, different data files are extracted in different extraction modes in the file formatting module, and data information in the data text is obtained.
Further, in the file formatting module, when the data file cannot be matched with the corresponding template, the scanning position and the scanning item are directly fed back, and the fed back precious metal material data file is modified or the precious metal material data file analysis plug-in is modified through manual judgment.
The method for the automatic extraction system of the noble metal material data file comprises the following steps:
s1, collecting electronic document data generated in experimental production in the precious metal industry;
s2, reading the collected noble metal material data file by an analysis plug-in, searching whether the analysis plug-in capable of analyzing the noble metal material data file exists or not, if the analysis plug-in does not exist, manually importing a new analysis plug-in aiming at the noble metal material data file, and if the analysis plug-in exists, analyzing the semi-structured data by the analysis plug-in;
s3, searching whether a precious metal material data file analysis template capable of mapping the analyzed semi-structured data to a precious metal material database exists or not, and completing one-to-one mapping of data to the database for the data analyzed by the analysis plug-in through the existing analysis module;
and S4, scanning the analyzed noble metal material data file in a data file format to obtain the noble metal material structured data, and storing the noble metal material structured data in a noble metal material database.
Further, in S3, when the semi-structured data analyzed cannot match the corresponding analysis template, the precious metal material data file needs to be reselected or the analysis plug-in of the precious metal material needs to be newly set.
The invention has the beneficial effects that:
1. the system provided by the invention can automatically extract and process all data in the field of precious metal materials in an informationized and digitalized manner with high automation and high efficiency by a set of complete and rigorous precious metal material data file under the condition of ensuring less manual work. When the document with the known format is fully automatically extracted and filed, the analysis capability can be expanded, and the self-improvement is continuously realized.
2. The method provided by the invention improves the digitization efficiency and accuracy of the noble metal material data, reduces the working difficulty and input cost of the input personnel, can contain most noble metal material data files produced in the unit working process of the noble metal material, and has good expansibility and error correction capability.
Drawings
In order to more clearly illustrate the embodiments or technical solutions in the prior art of the present invention, the drawings used in the description of the embodiments or prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained based on these drawings without creative efforts.
FIG. 1 is an overall system block diagram of an embodiment of the present invention;
FIG. 2 is a schematic flow chart diagram of an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As shown in fig. 1, an embodiment of the present invention provides an automated precious metal material data file extraction system, which includes an analysis plug-in module, an analysis template module, and a file formatting module,
analyzing the plug-in module: searching an analysis plug-in capable of analyzing an electronic document data file generated in the experiment production of the noble metal industry, and analyzing the data file through the analysis plug-in; each plug-in the analysis plug-in is responsible for analyzing a class of noble metal material data file, and for the analysis plug-in without the data file which can be analyzed, the analysis plug-in which is adapted to the data of the file is manually led in (manually pouring the plug-in into the development process is free, reading the file, reading the content of the file, obtaining a symbolic position, cutting, obtaining and processing the required data, the development process is short, the development is flexible), and the led-in analysis plug-in can be stored in a disk and led into an extraction system when required.
And (3) analyzing a template module: searching an analysis template capable of mapping the analyzed data to a noble metal material database, and analyzing the noble metal material data file through the analysis template; the analysis plug-in component can make processing logic aiming at the data of the noble metal material, wherein the processing logic mode is as follows: reading the data format of the specific noble metal material, and performing recombination and simple calculation statistical operation on the noble metal performance data.
Meanwhile, each template is responsible for generating a mapping relation between the analysis result and the data of the noble metal material database one by one, and endows the analysis data with complete structural attributes. And when the analysis result has various mappings, creating a plurality of precious metal material data file analysis templates.
A file formatting module: judging the format of the analyzed noble metal material data file, extracting different data files in a file formatting module in different extraction modes, obtaining data information in a data text, directly feeding back scanning positions and scanning items when the data files cannot be matched with corresponding templates, and modifying the fed back noble metal material data file or modifying an analysis plug-in of the noble metal material data file through manual judgment. And carrying out noble metal material structured data extraction on the data file format meeting the standard to obtain the noble metal material structured data.
As shown in fig. 2, the method of the present invention for automatically extracting a precious metal material data file includes the following steps:
s1, collecting electronic document data (such as Word, Excel, TXT, PDF and the like) generated in the experimental production of the precious metal industry.
S2, analyzing plug-in reading is carried out on the collected noble metal material data file, whether an analyzing plug-in capable of analyzing the noble metal material data file exists is found (each plug-in should be reasonably remarked and explained, and the use situation which the plug-in accords with can be conveniently judged by an operator), if the analyzing plug-in does not exist, a new analyzing plug-in aiming at the noble metal material data file is manually led in (self development, the plug-in development process is very free, the file is read, the file content is read, a symbolic position is obtained, required data is cut off, obtained and processed, the development process is short, and the development is flexible); the new self-developed analysis plug-in can be stored in a disk and imported into the extraction system when needed.
If the adapted analysis plug-in exists, S3 is executed continuously, each plug-in the analysis plug-in is responsible for analyzing a class of noble metal material data files, the plug-in performs special processing logic (for example, reading a specific noble metal material data format, performing recombination and simple calculation statistical operation on the noble metal performance data) on the noble metal material data, and ensures that the extracted data is accurate and meets the noble metal material data standard, and the data analyzed by the analysis plug-in is obtained as semi-structured data.
S3, finding whether a precious metal material data file analysis template capable of mapping the analyzed semi-structured data to a precious metal material database exists or not, completing one-to-one mapping of data analyzed by the analysis plug-in to the database through the existing analysis module, creating various precious metal material data file analysis templates when various mappings exist in analysis results, and meanwhile, when the analyzed semi-structured data cannot be matched with the corresponding analysis templates, reselecting the precious metal material data files or resetting the analysis plug-in of the precious metal material.
S4, scanning the analyzed noble metal material data file in a data file format, and ensuring that the input noble metal material data file conforms to the technical specification of automatic extraction by using a plurality of methods (namely, for an Excel file, using a pre-scanning technology, scanning Excel file layout and characteristic cells line by line and cell by cell, for a Word file, using the pre-scanning technology, scanning Word file layout and characteristic fields line by line, if the characteristic fields of the noble metal material data Word file use unique fonts/colors, the pre-scanning technology can also rapidly determine the positions of the characteristic fields, and for a PDF file, using a pre-resolving technology to resolve picture objects, text objects, table objects and the like in the PDF file, if the resolving result does not meet the template requirement, attempting to scan and regenerate PDF by using an OCR technology of ABBY company and then resolving again), and obtaining and storing the noble metal material structured data in a noble metal material database, wherein the noble metal material structured data can also be directly applied by the database.
Meanwhile, when the data file format is scanned, the mutually adaptive technical specifications cannot be found, and the places where the noble metal material data files do not correspond to the template are fed back; when the noble metal material data file format is scanned, the scanning position and the currently scanned item can be recorded while scanning, and if the scanning position and the currently scanned item are not satisfied in the process, the scanning position and the scanning item can be directly fed back; the staff can judge and modify the noble metal material data file according to the feedback of the technology or modify the noble metal material data file analysis plug-in according to the feedback of the technology.
The key structural information (database, data table, field name, data type, main key, etc.) of the precious metal material database can be obtained from the precious metal material database obtained finally, so that a precious metal material data file analysis module is determined, the precious metal material data file analysis module and an analysis module which needs to be determined again in the S3 process set up mapping rules according to one-to-one rules, a template file is generated, and the template is imported into a system for use.
According to the invention, through a set of complete and rigorous automatic extraction processing logic of the noble metal material data file, under the condition of ensuring that only less manual work is required (simple configuration and development of analysis plug-ins), the informationization and digitization of all data in the field of noble metal materials can be processed with high automation and high efficiency. The method can not only fully automatically extract and file the document with the known format, but also expand the resolving power and continuously improve the self.
The method improves the digitization efficiency and accuracy of the noble metal material data, reduces the work difficulty and input cost of the input personnel, can contain most noble metal material data files produced in the unit work process of the noble metal material, and has good expansibility and error correction capability.
In the description herein, references to the description of "one embodiment," "an example," "a specific example" or the like are intended to mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
The foregoing shows and describes the general principles, essential features, and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are described in the specification and illustrated only to illustrate the principle of the present invention, but that various changes and modifications may be made therein without departing from the spirit and scope of the present invention, which fall within the scope of the invention as claimed.

Claims (9)

1. An automatic extraction system of noble metal material data files is characterized by comprising an analysis plug-in module, an analysis template module and a file formatting module,
the analysis plug-in module: searching an analysis plug-in capable of analyzing an electronic document data file generated in the experiment production of the noble metal industry, and analyzing the data file through the analysis plug-in;
the analysis template module: searching an analysis template capable of mapping the analyzed data to a noble metal material database, and analyzing the noble metal material data file through the analysis template;
the file formatting module: and judging the format of the analyzed noble metal material data file, and extracting the noble metal material structured data of the data file format meeting the standard to obtain the noble metal material structured data.
2. The system of claim 1, wherein each of the parsing plug-ins is responsible for parsing a class of precious metal material data files, and the parsing plug-ins perform processing logic for the precious metal material data files.
3. The automated precious metal material data file extraction system of claim 2, wherein the parsing plug-in is processed in a logic manner of: reading the data format of the specific noble metal material, and performing recombination and simple calculation statistical operation on the noble metal performance data.
4. The system for automatically extracting the precious metal material data file according to claim 1, wherein for a parsing plugin without a data file capable of being parsed, a parsing plugin adapted to the data file is manually imported in the parsing plugin module.
5. The automated precious metal material data file extraction system according to claim 1, wherein each of the parsing templates is responsible for mapping the parsing result with data of the precious metal material database one by one, and when there are multiple mappings in the parsing result, multiple precious metal material data file parsing templates are created.
6. The automated precious metal material data file extraction system of claim 1, wherein the file formatting module extracts different data files in different extraction manners and obtains data information in the data text.
7. The automatic extraction system of noble metal material data files according to claim 6, wherein in the file formatting module, when the data files cannot be matched with the corresponding template, the scanning position and the scanning item are directly fed back, and the fed back noble metal material data files are modified through manual judgment or the noble metal material data file parsing plug-in is modified.
8. The method for the automated precious metal material data file extraction system of any one of claims 1 to 7, comprising the steps of:
s1, collecting electronic document data generated in experimental production in the precious metal industry;
s2, reading the collected noble metal material data file by an analysis plug-in, searching whether the analysis plug-in capable of analyzing the noble metal material data file exists or not, if the analysis plug-in does not exist, manually importing a new analysis plug-in aiming at the noble metal material data file, and if the analysis plug-in exists, analyzing the semi-structured data by the analysis plug-in;
s3, searching whether a precious metal material data file analysis template capable of mapping the analyzed semi-structured data to a precious metal material database exists or not, and completing one-to-one mapping of data to the database for the data analyzed by the analysis plug-in through the existing analysis module;
and S4, scanning the analyzed noble metal material data file in a data file format to obtain the noble metal material structured data, and storing the noble metal material structured data in a noble metal material database.
9. The method of claim 8, wherein in step S3, when the parsed semi-structured data fails to match the corresponding parsing template, the precious metal material data file needs to be reselected or the parsing plug-in of the precious metal material needs to be reset.
CN202111247547.8A 2021-10-26 2021-10-26 Automatic precious metal material data file extraction system and method Withdrawn CN114036926A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111247547.8A CN114036926A (en) 2021-10-26 2021-10-26 Automatic precious metal material data file extraction system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111247547.8A CN114036926A (en) 2021-10-26 2021-10-26 Automatic precious metal material data file extraction system and method

Publications (1)

Publication Number Publication Date
CN114036926A true CN114036926A (en) 2022-02-11

Family

ID=80141929

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111247547.8A Withdrawn CN114036926A (en) 2021-10-26 2021-10-26 Automatic precious metal material data file extraction system and method

Country Status (1)

Country Link
CN (1) CN114036926A (en)

Similar Documents

Publication Publication Date Title
CN102999524B (en) A kind of document associations search method and system
WO2006002009A2 (en) Document management system with enhanced intelligent document recognition capabilities
CN106528684A (en) Method and system for establishing engineering material database
CN111666747A (en) Method for generating WORD document into description class data module conforming to S1000D standard
CN112668622A (en) Analysis method and analysis and calculation device for coal geological composition data
CN110765402A (en) Visual acquisition system and method based on network resources
CN114330284A (en) Rule model-based automatic insurance clause analysis method
CN116451665A (en) Method for intelligently generating design BOM based on drawing
CN111369133A (en) Big data risk monitoring system
CN101452383B (en) Interface antetype design method and design system
CN112733345A (en) Automatic three-dimensional marking method and system for aviation bolt
CN114036926A (en) Automatic precious metal material data file extraction system and method
CN111522815A (en) Method for warehousing enterprise basic information
CN116881512A (en) Cross-system metadata blood-edge automatic analysis method
US20230409531A1 (en) Method for real-time extraction of on-chip simulation information
CN115454964A (en) Data migration method and system
CN115757479A (en) Database query optimization method, machine-readable storage medium and computer device
CN111275409A (en) Power grid overhaul audit data processing system and processing method
CN112559609A (en) Method for reading transformer data from Word and storing transformer data into Excel
CN114495138A (en) Intelligent document identification and feature extraction method, device platform and storage medium
CN112612841A (en) Knowledge extraction construction method, device, equipment and storage medium
CN107368472B (en) Storage method of document analysis result capable of being iteratively optimized
CN112463728A (en) Bibliographic data extraction method of scientific and technological literature
CN111460786A (en) Technical method for analyzing traditional document structure
CN117350643B (en) Scientific research data modification integration system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20220211

WW01 Invention patent application withdrawn after publication