CN113254518A - Information resource management and analysis method based on particle data - Google Patents
Information resource management and analysis method based on particle data Download PDFInfo
- Publication number
- CN113254518A CN113254518A CN202110563420.0A CN202110563420A CN113254518A CN 113254518 A CN113254518 A CN 113254518A CN 202110563420 A CN202110563420 A CN 202110563420A CN 113254518 A CN113254518 A CN 113254518A
- Authority
- CN
- China
- Prior art keywords
- data
- particle data
- particle
- analysis
- resource management
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 239000002245 particle Substances 0.000 title claims abstract description 67
- 238000004458 analytical method Methods 0.000 title claims abstract description 30
- 238000007726 management method Methods 0.000 title claims abstract description 22
- 238000007405 data analysis Methods 0.000 claims abstract description 13
- 238000007781 pre-processing Methods 0.000 claims abstract description 5
- 238000011068 loading method Methods 0.000 claims abstract description 4
- 238000000034 method Methods 0.000 claims description 22
- 238000012545 processing Methods 0.000 claims description 7
- 238000004364 calculation method Methods 0.000 claims description 4
- 239000008187 granular material Substances 0.000 claims description 4
- 238000013515 script Methods 0.000 claims description 3
- 230000000007 visual effect Effects 0.000 claims description 2
- 238000005516 engineering process Methods 0.000 abstract description 7
- 238000013507 mapping Methods 0.000 abstract 1
- 230000008569 process Effects 0.000 description 4
- 238000007418 data mining Methods 0.000 description 3
- 238000010801 machine learning Methods 0.000 description 3
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000013527 convolutional neural network Methods 0.000 description 2
- 238000013075 data extraction Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000004140 cleaning Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013523 data management Methods 0.000 description 1
- 238000013499 data model Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000006855 networking Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/258—Data format conversion from or to a database
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/901—Indexing; Data structures therefor; Storage structures
- G06F16/9024—Graphs; Linked lists
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/604—Tools and structures for managing or administering access control systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/02—Knowledge representation; Symbolic representation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2221/00—Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F2221/21—Indexing scheme relating to G06F21/00 and subgroups addressing additional information or applications relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F2221/2141—Access rights, e.g. capability lists, access control lists, access tables, access matrices
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- Bioethics (AREA)
- Health & Medical Sciences (AREA)
- Automation & Control Theory (AREA)
- General Health & Medical Sciences (AREA)
- Computer Hardware Design (AREA)
- Computer Security & Cryptography (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses an information resource management and analysis method based on particle data, which comprises the following steps: 1. constructing a label knowledge body of the particle data; 2. preprocessing an input multi-source data set, endowing each unit with a multi-dimensional label, and generating a particle data set; 3. constructing a particle data logic storage adapter, and mapping a particle data set to physical storage; 4. constructing a particle data loading component; 5. defining a grain data analysis description language and constructing a grain data analysis component; 6. the unified intra-system and inter-system particle data sharing access control decision maker is realized, and the access authority of each particle data is determined according to the judgment result of the access control decision maker; 7. a formatted and visualized output of the result particle data set is provided. The invention manages and analyzes information resources flexibly and efficiently without losing generality, and solves the problems of weak universality and flexibility of the existing information resource management and analysis technology.
Description
Technical Field
The invention belongs to the field of information resource management and analysis, and particularly relates to an information resource analysis method based on particle data.
Background
With the informatization and networking of human activities, mass data and information resources borne by the mass data are distributed in the current network space, so that effective management and analysis of the multi-element heterogeneous information resources are necessary. With the increasing of computing power and the application of new artificial intelligence technologies, new opportunities and challenges are faced in the management and analysis of information resources.
The current information resource management and analysis technology generally has the following three methods:
1. data mining technology based on data warehouse
The data warehouse is a theme-oriented data set, a multidimensional data model is constructed by three processes of data extraction, conversion and loading of information resources, namely an ETL process, and data analysis and assistant decision making are realized by Online Analytical Processing (OLAP). A representative system is Oracle Warehouse and supports characteristics of complex queries, data snapshots and the like facing the theme.
Although the method has high query and analysis efficiency, the data extraction and conversion process needs a large amount of data cleaning processes such as standardization, normalization and the like, otherwise, the data quality is difficult to guarantee. In addition, the data warehouse only depends on the theme attributes of the data for management and analysis, has single dimension and cannot be dynamically adjusted, and greatly influences the data mining and analyzing capability.
2. Data analysis technology based on knowledge graph
The knowledge graph is a semantic network for describing the relationship between entities, and the internal association relationship of data is mined and predicted by extracting, expressing and fusing knowledge in information, so that the deep analysis and application of the data are realized. A representative system is a Neo4j graph database, and supports the construction of a knowledge graph and strong query capability.
The method has strong data mining and knowledge reasoning capabilities, and can realize self-learning capabilities to a certain extent on the layer surface by means of a link prediction algorithm. However, since this method needs to rely on a Schema (Schema) based on expert knowledge when extracting knowledge, it cannot be applied to data sets with weak or uncertain logical relationships, and this method is only applicable to limited application fields such as search engine recommendation, intelligent question answering, etc.
3. Machine learning technology based on big data
No matter machine learning in a supervision or unsupervised mode, the method always trains a large-scale data set in advance, classifies and predicts data according to a generated model, and mines potential association relation from multi-source mass data. Representative works include convolutional neural networks CNN, recurrent neural networks RNN, etc., supporting learning of data features and prediction of data sequences.
The method has a remarkable effect on the analysis of the data set with an ambiguous incidence relation or lacking expert knowledge, but the method is in transition dependence on training data, so that the accuracy and the recall rate cannot be guaranteed. In addition, for the machine learning model, the interpretability of the data analysis result is poor, and the data analysis quality cannot be quantified.
In summary, the current information resource management and analysis techniques are poor in generality, flexibility and usability.
Disclosure of Invention
Aiming at the problems in the prior art, the invention aims to provide an information resource management and analysis method based on particle data, which decomposes multi-source mass data into particle data sets, endows each particle data with a group of labels, and realizes high universality and flexibility of information resource management and analysis capability by means of a complex semantic query mechanism.
A method for managing and analyzing information resources based on particle data comprises the following steps:
step S01: constructing a label knowledge body (Ontology) of the particle data according to the national standard and the best industry practice;
step S02: preprocessing the input multi-source data set based on the label knowledge body constructed in the step S01, extracting a minimum processing unit with complete logical meaning, and endowing each unit with a multi-dimensional label to generate a particle data set;
step S03: according to different physical storage architectures, a flexible particle data logic storage adapter is constructed, and the particle data set generated in the step S02 is mapped to physical storage;
step S04: constructing a particle data loading assembly, supporting conventional retrieval and semantic calculation based on particle data labels, and extracting a particle data set stored in S03 according to application requirements for further analysis and processing;
step S05: defining a particle data analysis description language, constructing a particle data analysis component, analyzing the data set output by the S04, and generating an analysis result particle data set;
step S06: a uniform intra-system and inter-system particle data sharing access control decision maker is realized, and the authority of each particle data is determined for the particle data sets generated in the step S04 and the step S05 according to the judgment result of the access control decision maker;
step S07: report data and visual output are provided for the resulting particle data set of step S06.
Further, the "particle data tag ontology" described in step S01 refers to the attribute set of the particle data, including indexes, groups, time, space, measure units and subjects.
Further, the "pre-processing the input multi-source data set" in step S02 is performed as follows: (1) for semi-structured data, converting the semi-structured data into structured data by decoupling a plurality of layers of nested attributes in a Schema (Schema); (2) for unstructured data, the unstructured data is converted into structured data in a < Key, Value > mode by calculating the hash Value of the data. All multi-source heterogeneous data is finally consolidated into structured data for further processing.
Further, the "physical storage architecture" in step S03 includes a relational database, a NoSQL database, and a graph database.
Further, the "logical storage adapter" described in step S03 refers to the middleware that maps the granule data into the physical storage according to the physical storage architecture.
Further, the "conventional search of the tag" in step S04 refers to querying the value range of the tag and the statistical index including the maximum value, the minimum value, and the average value.
Further, the "semantic computation of labels" described in step S04 refers to matching labels having similar logical meanings according to a given subject term.
Further, the "grain data analysis description language" described in step S05 provides a complex semantic relationship operation descriptor including an arithmetic operation, a logical operation, a custom complex operation script, and a predicate logic.
Further, the "access control decider" in step S06 refers to determining whether to allow or deny access to the grain data according to an access control policy, and supports coarse-grained access control based on Traffic Light Protocol (TLP) and fine-grained access control based on roles.
Through the steps, the information resource management and analysis method based on the particle data is realized, the universality is not lost while the information resources are managed and analyzed flexibly and efficiently, and the problems of poor universality and poor flexibility of the existing information resource management and analysis technology are solved.
By means of the technical scheme, the information resource management and analysis system based on the particle data is constructed, the data are decomposed into the minimum units representing the logic significance, and the multidimensional labels are given, so that the data are managed and analyzed on finer granularity, the universality and the flexibility are guaranteed, the cost of manually marking the data and training the data is reduced, and the problems of high data management cost and poor universality are effectively solved.
Drawings
Fig. 1 is a schematic diagram illustrating an embodiment of a method for managing and analyzing information resources based on particle data according to the present invention.
Detailed Description
In order to make the description of the technical solution clearer for the purpose of the method of the present invention, the following detailed description is made of specific embodiments.
Step 101: according to the national economy industry classification, a particle data label knowledge body is constructed, and the particle data label knowledge body comprises labels and value ranges of numerical indexes, non-numerical groups, measurement units, administrative divisions, time periods and the like.
Step 102: and (3) inputting multi-source heterogeneous data published by a national statistical department and each industry administrative department, automatically decomposing the data based on the label knowledge body constructed in the step (101) to generate a particle data set, and endowing multi-dimensional labels such as the industry, administrative divisions, time periods, statistical apertures and the like to each particle data.
Step 103: implementing a granular data logical storage adapter obtains compatibility for a particular physical storage architecture. For a relational database such as Oracle, converting the particle data set machine label output in the step 102 into a two-dimensional table for storage; for a Key-Value database such as elastic search, converting the grain data output in step 102 and all tags thereof into a plurality of < Key, Value > records; for a graph database such as Neo4j, the particle data and the tags output in step 102 are stored as graph nodes, and the relationships among the tags such as Create, Inherit, Include and the like form edges of the graph.
Step 104: and the conventional retrieval of the particle data set is supported, the calculation of statistical indexes such as the maximum value, the minimum value, the average value, the variance, the median and the like of numerical data is realized, and the query result subset is output according to the retrieval range.
Step 105: semantic calculation is supported to be carried out on the particle data set, similarity comparison is carried out on the input keywords and the labels of the particle data, the labels exceeding the threshold value and the marked data are used as hit data, and a query result subset is output.
Step 106: and defining a set of particle data analysis description language, calculating the subset of the query results output in the steps 104 and 105, and supporting set operation, arithmetic operation, logic operation, predicate logic and custom complex scripts.
Step 107: access control is realized on each particle data, and a data access level label based on protocols such as traffic is realized: red, orange, yellow and green; role-based data access control is implemented, accessible only to authorized principals.
Step 108: on the premise that the step 107 allows access, the data output in the steps 104, 105 and 106 are formatted and visualized.
Although specific embodiments of the invention have been disclosed for purposes of illustration and to aid in the understanding of the contents of the invention and the manner in which it may be practiced, those skilled in the art will appreciate that: various substitutions, changes and modifications are possible without departing from the spirit and scope of the present invention and the appended claims. Therefore, it is intended that the invention not be limited to the particular embodiment disclosed as the best mode contemplated for carrying out this invention, but that the invention will include all embodiments falling within the scope of the appended claims.
Claims (7)
1. An information resource management and analysis method based on particle data is characterized by comprising the following steps:
step S01: constructing a label knowledge body of the particle data according to the national standard and the best industry practice;
step S02: preprocessing the input multi-source data set based on the label knowledge body constructed in the step S01, extracting a minimum processing unit with complete logical meaning, and endowing each unit with a multi-dimensional label to generate a particle data set;
step S03: according to different physical storage architectures, a flexible particle data logic storage adapter is constructed, and the particle data set generated in the step S02 is mapped to physical storage;
step S04: constructing a particle data loading assembly, supporting conventional retrieval and semantic calculation based on particle data labels, and extracting a particle data set stored in S03 according to application requirements for further analysis and processing;
step S05: defining a particle data analysis description language, constructing a particle data analysis component, analyzing the data set output by the S04, and generating an analysis result particle data set;
step S06: a uniform intra-system and inter-system particle data sharing access control decision maker is realized, and the authority of each particle data is determined for the particle data sets generated in the step S04 and the step S05 according to the judgment result of the access control decision maker;
step S07: report data and visual output are provided for the resulting particle data set of step S06.
2. The method for information resource management and analysis based on particle data as claimed in claim 1, wherein: the "granule data tag ontology" described in step S01 refers to the attribute set of the granule data, including indexes, groups, time, space, measure units, and subjects.
3. The method for information resource management and analysis based on particle data as claimed in claim 1, wherein: "preprocessing the input multi-source data set" in step S02 is performed as follows: (1) for semi-structured data, converting the semi-structured data into structured data by decoupling multiple layers of nested attributes in a mode; (2) for unstructured data, the hash Value of the data is calculated, the unstructured data is converted into structured data in a Key and Value mode, and finally all multi-source heterogeneous data are unified into the structured data for further processing.
4. The method for information resource management and analysis based on particle data as claimed in claim 1, wherein:
the "physical storage architecture" described in step S03, including a relational database, a NoSQL database, and a graph database;
the "logical storage adapter" described in step S03 refers to the middleware that maps the granule data into the physical storage according to the physical storage architecture.
5. The method for information resource management and analysis based on particle data as claimed in claim 1, wherein:
the "conventional retrieval of the tag" in step S04 refers to querying a value range of the tag and statistical indexes including a maximum value, a minimum value, and an average value;
the "semantic computation of tags" described in step S04 refers to matching tags having similar logical meanings according to a given subject word.
6. The method for information resource management and analysis based on particle data as claimed in claim 1, wherein: the "grain data analysis description language" described in step S05 provides a complex semantic relationship operation descriptor including an arithmetic operation, a logical operation, a custom complex operation script, and including predicate logic.
7. The method for information resource management and analysis based on particle data as claimed in claim 1, wherein: the "access control decider" described in step S06 refers to a decision as to whether to permit or deny access to the traffic light protocol TLP or to support coarse-grained access control based on traffic light protocol TLP and fine-grained access control based on roles according to an access control policy.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110563420.0A CN113254518A (en) | 2021-05-21 | 2021-05-21 | Information resource management and analysis method based on particle data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110563420.0A CN113254518A (en) | 2021-05-21 | 2021-05-21 | Information resource management and analysis method based on particle data |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113254518A true CN113254518A (en) | 2021-08-13 |
Family
ID=77184044
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110563420.0A Pending CN113254518A (en) | 2021-05-21 | 2021-05-21 | Information resource management and analysis method based on particle data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113254518A (en) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102609512A (en) * | 2012-02-07 | 2012-07-25 | 北京中机科海科技发展有限公司 | System and method for heterogeneous information mining and visual analysis |
CN108021592A (en) * | 2016-11-04 | 2018-05-11 | 上海大学 | A kind of Unstructured Data Management for ARTBEATS DESKTOP TECHNOLOGY NTSC field |
CN110415831A (en) * | 2019-07-18 | 2019-11-05 | 天宜(天津)信息科技有限公司 | A kind of medical treatment big data cloud service analysis platform |
CN110750686A (en) * | 2019-10-12 | 2020-02-04 | 河海大学 | Fusion system and fusion method of global heterogeneous data |
CN110874414A (en) * | 2020-01-19 | 2020-03-10 | 北京同方软件有限公司 | Policy interpretation method based on data joint service |
CN111221887A (en) * | 2018-11-27 | 2020-06-02 | 中云开源数据技术(上海)有限公司 | Method for managing and accessing data in data lake server |
CN111680041A (en) * | 2020-05-31 | 2020-09-18 | 西南电子技术研究所(中国电子科技集团公司第十研究所) | Safe and efficient access method for heterogeneous data |
-
2021
- 2021-05-21 CN CN202110563420.0A patent/CN113254518A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102609512A (en) * | 2012-02-07 | 2012-07-25 | 北京中机科海科技发展有限公司 | System and method for heterogeneous information mining and visual analysis |
CN108021592A (en) * | 2016-11-04 | 2018-05-11 | 上海大学 | A kind of Unstructured Data Management for ARTBEATS DESKTOP TECHNOLOGY NTSC field |
CN111221887A (en) * | 2018-11-27 | 2020-06-02 | 中云开源数据技术(上海)有限公司 | Method for managing and accessing data in data lake server |
CN110415831A (en) * | 2019-07-18 | 2019-11-05 | 天宜(天津)信息科技有限公司 | A kind of medical treatment big data cloud service analysis platform |
CN110750686A (en) * | 2019-10-12 | 2020-02-04 | 河海大学 | Fusion system and fusion method of global heterogeneous data |
CN110874414A (en) * | 2020-01-19 | 2020-03-10 | 北京同方软件有限公司 | Policy interpretation method based on data joint service |
CN111680041A (en) * | 2020-05-31 | 2020-09-18 | 西南电子技术研究所(中国电子科技集团公司第十研究所) | Safe and efficient access method for heterogeneous data |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Lv et al. | Advanced machine learning on cognitive computing for human behavior analysis | |
Yang et al. | Research on enterprise risk knowledge graph based on multi-source data fusion | |
CN112463981A (en) | Enterprise internal operation management risk identification and extraction method and system based on deep learning | |
Cagliero et al. | Improving classification models with taxonomy information | |
Wang et al. | The abnormal traffic detection scheme based on PCA and SSH | |
Wang et al. | Research on anomaly detection and real-time reliability evaluation with the log of cloud platform | |
Ma et al. | Fuzzy knowledge management for the semantic web | |
Parameswaran et al. | Optimizing open-ended crowdsourcing: The next frontier in crowdsourced data management | |
Leung et al. | Health analytics on COVID-19 data with few-shot learning | |
Thenmozhi et al. | An ontological approach to handle multidimensional schema evolution for data warehouse | |
Roddick et al. | Higher order mining | |
Pavlopoulou et al. | IoTSAX: A dynamic abstractive entity summarization approach with approximation and embedding-based reasoning rules in publish/subscribe systems | |
Sanprasit et al. | Intelligent approach to automated star-schema construction using a knowledge base | |
CN113254518A (en) | Information resource management and analysis method based on particle data | |
Mitov | Class association rule mining using multidimensional numbered information spaces | |
CN114048257A (en) | Grain data decomposition method based on service standard driving | |
Li et al. | Research on storage method for fuzzy RDF graph based on Neo4j | |
Ali et al. | An effective classification approach for big data with parallel generalized Hebbian algorithm | |
Aljibawi et al. | A survey on clustering density based data stream algorithms | |
Wang et al. | Matching biomedical ontologies with GCN-based feature propagation | |
CN114329570A (en) | Data encryption analysis method and device | |
Baraldi et al. | An Intrinsically Interpretable Entity Matching System. | |
Su et al. | [Retracted] Design and Application of Intelligent Management Platform Based on Big Data | |
Sawarkar et al. | Automated metadata harmonization using entity resolution and contextual embedding | |
Yang et al. | Graph convolutional networks with dependency parser towards multiview representation learning for sentiment analysis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |