CN113254518A - Information resource management and analysis method based on particle data - Google Patents

Information resource management and analysis method based on particle data Download PDF

Info

Publication number
CN113254518A
CN113254518A CN202110563420.0A CN202110563420A CN113254518A CN 113254518 A CN113254518 A CN 113254518A CN 202110563420 A CN202110563420 A CN 202110563420A CN 113254518 A CN113254518 A CN 113254518A
Authority
CN
China
Prior art keywords
data
particle data
particle
analysis
resource management
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110563420.0A
Other languages
Chinese (zh)
Inventor
黄德会
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jingruan Weiye Information Technology Beijing Co ltd
Original Assignee
Jingruan Weiye Information Technology Beijing Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jingruan Weiye Information Technology Beijing Co ltd filed Critical Jingruan Weiye Information Technology Beijing Co ltd
Priority to CN202110563420.0A priority Critical patent/CN113254518A/en
Publication of CN113254518A publication Critical patent/CN113254518A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/258Data format conversion from or to a database
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/604Tools and structures for managing or administering access control systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/21Indexing scheme relating to G06F21/00 and subgroups addressing additional information or applications relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/2141Access rights, e.g. capability lists, access control lists, access tables, access matrices

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Bioethics (AREA)
  • Health & Medical Sciences (AREA)
  • Automation & Control Theory (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses an information resource management and analysis method based on particle data, which comprises the following steps: 1. constructing a label knowledge body of the particle data; 2. preprocessing an input multi-source data set, endowing each unit with a multi-dimensional label, and generating a particle data set; 3. constructing a particle data logic storage adapter, and mapping a particle data set to physical storage; 4. constructing a particle data loading component; 5. defining a grain data analysis description language and constructing a grain data analysis component; 6. the unified intra-system and inter-system particle data sharing access control decision maker is realized, and the access authority of each particle data is determined according to the judgment result of the access control decision maker; 7. a formatted and visualized output of the result particle data set is provided. The invention manages and analyzes information resources flexibly and efficiently without losing generality, and solves the problems of weak universality and flexibility of the existing information resource management and analysis technology.

Description

Information resource management and analysis method based on particle data
Technical Field
The invention belongs to the field of information resource management and analysis, and particularly relates to an information resource analysis method based on particle data.
Background
With the informatization and networking of human activities, mass data and information resources borne by the mass data are distributed in the current network space, so that effective management and analysis of the multi-element heterogeneous information resources are necessary. With the increasing of computing power and the application of new artificial intelligence technologies, new opportunities and challenges are faced in the management and analysis of information resources.
The current information resource management and analysis technology generally has the following three methods:
1. data mining technology based on data warehouse
The data warehouse is a theme-oriented data set, a multidimensional data model is constructed by three processes of data extraction, conversion and loading of information resources, namely an ETL process, and data analysis and assistant decision making are realized by Online Analytical Processing (OLAP). A representative system is Oracle Warehouse and supports characteristics of complex queries, data snapshots and the like facing the theme.
Although the method has high query and analysis efficiency, the data extraction and conversion process needs a large amount of data cleaning processes such as standardization, normalization and the like, otherwise, the data quality is difficult to guarantee. In addition, the data warehouse only depends on the theme attributes of the data for management and analysis, has single dimension and cannot be dynamically adjusted, and greatly influences the data mining and analyzing capability.
2. Data analysis technology based on knowledge graph
The knowledge graph is a semantic network for describing the relationship between entities, and the internal association relationship of data is mined and predicted by extracting, expressing and fusing knowledge in information, so that the deep analysis and application of the data are realized. A representative system is a Neo4j graph database, and supports the construction of a knowledge graph and strong query capability.
The method has strong data mining and knowledge reasoning capabilities, and can realize self-learning capabilities to a certain extent on the layer surface by means of a link prediction algorithm. However, since this method needs to rely on a Schema (Schema) based on expert knowledge when extracting knowledge, it cannot be applied to data sets with weak or uncertain logical relationships, and this method is only applicable to limited application fields such as search engine recommendation, intelligent question answering, etc.
3. Machine learning technology based on big data
No matter machine learning in a supervision or unsupervised mode, the method always trains a large-scale data set in advance, classifies and predicts data according to a generated model, and mines potential association relation from multi-source mass data. Representative works include convolutional neural networks CNN, recurrent neural networks RNN, etc., supporting learning of data features and prediction of data sequences.
The method has a remarkable effect on the analysis of the data set with an ambiguous incidence relation or lacking expert knowledge, but the method is in transition dependence on training data, so that the accuracy and the recall rate cannot be guaranteed. In addition, for the machine learning model, the interpretability of the data analysis result is poor, and the data analysis quality cannot be quantified.
In summary, the current information resource management and analysis techniques are poor in generality, flexibility and usability.
Disclosure of Invention
Aiming at the problems in the prior art, the invention aims to provide an information resource management and analysis method based on particle data, which decomposes multi-source mass data into particle data sets, endows each particle data with a group of labels, and realizes high universality and flexibility of information resource management and analysis capability by means of a complex semantic query mechanism.
A method for managing and analyzing information resources based on particle data comprises the following steps:
step S01: constructing a label knowledge body (Ontology) of the particle data according to the national standard and the best industry practice;
step S02: preprocessing the input multi-source data set based on the label knowledge body constructed in the step S01, extracting a minimum processing unit with complete logical meaning, and endowing each unit with a multi-dimensional label to generate a particle data set;
step S03: according to different physical storage architectures, a flexible particle data logic storage adapter is constructed, and the particle data set generated in the step S02 is mapped to physical storage;
step S04: constructing a particle data loading assembly, supporting conventional retrieval and semantic calculation based on particle data labels, and extracting a particle data set stored in S03 according to application requirements for further analysis and processing;
step S05: defining a particle data analysis description language, constructing a particle data analysis component, analyzing the data set output by the S04, and generating an analysis result particle data set;
step S06: a uniform intra-system and inter-system particle data sharing access control decision maker is realized, and the authority of each particle data is determined for the particle data sets generated in the step S04 and the step S05 according to the judgment result of the access control decision maker;
step S07: report data and visual output are provided for the resulting particle data set of step S06.
Further, the "particle data tag ontology" described in step S01 refers to the attribute set of the particle data, including indexes, groups, time, space, measure units and subjects.
Further, the "pre-processing the input multi-source data set" in step S02 is performed as follows: (1) for semi-structured data, converting the semi-structured data into structured data by decoupling a plurality of layers of nested attributes in a Schema (Schema); (2) for unstructured data, the unstructured data is converted into structured data in a < Key, Value > mode by calculating the hash Value of the data. All multi-source heterogeneous data is finally consolidated into structured data for further processing.
Further, the "physical storage architecture" in step S03 includes a relational database, a NoSQL database, and a graph database.
Further, the "logical storage adapter" described in step S03 refers to the middleware that maps the granule data into the physical storage according to the physical storage architecture.
Further, the "conventional search of the tag" in step S04 refers to querying the value range of the tag and the statistical index including the maximum value, the minimum value, and the average value.
Further, the "semantic computation of labels" described in step S04 refers to matching labels having similar logical meanings according to a given subject term.
Further, the "grain data analysis description language" described in step S05 provides a complex semantic relationship operation descriptor including an arithmetic operation, a logical operation, a custom complex operation script, and a predicate logic.
Further, the "access control decider" in step S06 refers to determining whether to allow or deny access to the grain data according to an access control policy, and supports coarse-grained access control based on Traffic Light Protocol (TLP) and fine-grained access control based on roles.
Through the steps, the information resource management and analysis method based on the particle data is realized, the universality is not lost while the information resources are managed and analyzed flexibly and efficiently, and the problems of poor universality and poor flexibility of the existing information resource management and analysis technology are solved.
By means of the technical scheme, the information resource management and analysis system based on the particle data is constructed, the data are decomposed into the minimum units representing the logic significance, and the multidimensional labels are given, so that the data are managed and analyzed on finer granularity, the universality and the flexibility are guaranteed, the cost of manually marking the data and training the data is reduced, and the problems of high data management cost and poor universality are effectively solved.
Drawings
Fig. 1 is a schematic diagram illustrating an embodiment of a method for managing and analyzing information resources based on particle data according to the present invention.
Detailed Description
In order to make the description of the technical solution clearer for the purpose of the method of the present invention, the following detailed description is made of specific embodiments.
Step 101: according to the national economy industry classification, a particle data label knowledge body is constructed, and the particle data label knowledge body comprises labels and value ranges of numerical indexes, non-numerical groups, measurement units, administrative divisions, time periods and the like.
Step 102: and (3) inputting multi-source heterogeneous data published by a national statistical department and each industry administrative department, automatically decomposing the data based on the label knowledge body constructed in the step (101) to generate a particle data set, and endowing multi-dimensional labels such as the industry, administrative divisions, time periods, statistical apertures and the like to each particle data.
Step 103: implementing a granular data logical storage adapter obtains compatibility for a particular physical storage architecture. For a relational database such as Oracle, converting the particle data set machine label output in the step 102 into a two-dimensional table for storage; for a Key-Value database such as elastic search, converting the grain data output in step 102 and all tags thereof into a plurality of < Key, Value > records; for a graph database such as Neo4j, the particle data and the tags output in step 102 are stored as graph nodes, and the relationships among the tags such as Create, Inherit, Include and the like form edges of the graph.
Step 104: and the conventional retrieval of the particle data set is supported, the calculation of statistical indexes such as the maximum value, the minimum value, the average value, the variance, the median and the like of numerical data is realized, and the query result subset is output according to the retrieval range.
Step 105: semantic calculation is supported to be carried out on the particle data set, similarity comparison is carried out on the input keywords and the labels of the particle data, the labels exceeding the threshold value and the marked data are used as hit data, and a query result subset is output.
Step 106: and defining a set of particle data analysis description language, calculating the subset of the query results output in the steps 104 and 105, and supporting set operation, arithmetic operation, logic operation, predicate logic and custom complex scripts.
Step 107: access control is realized on each particle data, and a data access level label based on protocols such as traffic is realized: red, orange, yellow and green; role-based data access control is implemented, accessible only to authorized principals.
Step 108: on the premise that the step 107 allows access, the data output in the steps 104, 105 and 106 are formatted and visualized.
Although specific embodiments of the invention have been disclosed for purposes of illustration and to aid in the understanding of the contents of the invention and the manner in which it may be practiced, those skilled in the art will appreciate that: various substitutions, changes and modifications are possible without departing from the spirit and scope of the present invention and the appended claims. Therefore, it is intended that the invention not be limited to the particular embodiment disclosed as the best mode contemplated for carrying out this invention, but that the invention will include all embodiments falling within the scope of the appended claims.

Claims (7)

1. An information resource management and analysis method based on particle data is characterized by comprising the following steps:
step S01: constructing a label knowledge body of the particle data according to the national standard and the best industry practice;
step S02: preprocessing the input multi-source data set based on the label knowledge body constructed in the step S01, extracting a minimum processing unit with complete logical meaning, and endowing each unit with a multi-dimensional label to generate a particle data set;
step S03: according to different physical storage architectures, a flexible particle data logic storage adapter is constructed, and the particle data set generated in the step S02 is mapped to physical storage;
step S04: constructing a particle data loading assembly, supporting conventional retrieval and semantic calculation based on particle data labels, and extracting a particle data set stored in S03 according to application requirements for further analysis and processing;
step S05: defining a particle data analysis description language, constructing a particle data analysis component, analyzing the data set output by the S04, and generating an analysis result particle data set;
step S06: a uniform intra-system and inter-system particle data sharing access control decision maker is realized, and the authority of each particle data is determined for the particle data sets generated in the step S04 and the step S05 according to the judgment result of the access control decision maker;
step S07: report data and visual output are provided for the resulting particle data set of step S06.
2. The method for information resource management and analysis based on particle data as claimed in claim 1, wherein: the "granule data tag ontology" described in step S01 refers to the attribute set of the granule data, including indexes, groups, time, space, measure units, and subjects.
3. The method for information resource management and analysis based on particle data as claimed in claim 1, wherein: "preprocessing the input multi-source data set" in step S02 is performed as follows: (1) for semi-structured data, converting the semi-structured data into structured data by decoupling multiple layers of nested attributes in a mode; (2) for unstructured data, the hash Value of the data is calculated, the unstructured data is converted into structured data in a Key and Value mode, and finally all multi-source heterogeneous data are unified into the structured data for further processing.
4. The method for information resource management and analysis based on particle data as claimed in claim 1, wherein:
the "physical storage architecture" described in step S03, including a relational database, a NoSQL database, and a graph database;
the "logical storage adapter" described in step S03 refers to the middleware that maps the granule data into the physical storage according to the physical storage architecture.
5. The method for information resource management and analysis based on particle data as claimed in claim 1, wherein:
the "conventional retrieval of the tag" in step S04 refers to querying a value range of the tag and statistical indexes including a maximum value, a minimum value, and an average value;
the "semantic computation of tags" described in step S04 refers to matching tags having similar logical meanings according to a given subject word.
6. The method for information resource management and analysis based on particle data as claimed in claim 1, wherein: the "grain data analysis description language" described in step S05 provides a complex semantic relationship operation descriptor including an arithmetic operation, a logical operation, a custom complex operation script, and including predicate logic.
7. The method for information resource management and analysis based on particle data as claimed in claim 1, wherein: the "access control decider" described in step S06 refers to a decision as to whether to permit or deny access to the traffic light protocol TLP or to support coarse-grained access control based on traffic light protocol TLP and fine-grained access control based on roles according to an access control policy.
CN202110563420.0A 2021-05-21 2021-05-21 Information resource management and analysis method based on particle data Pending CN113254518A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110563420.0A CN113254518A (en) 2021-05-21 2021-05-21 Information resource management and analysis method based on particle data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110563420.0A CN113254518A (en) 2021-05-21 2021-05-21 Information resource management and analysis method based on particle data

Publications (1)

Publication Number Publication Date
CN113254518A true CN113254518A (en) 2021-08-13

Family

ID=77184044

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110563420.0A Pending CN113254518A (en) 2021-05-21 2021-05-21 Information resource management and analysis method based on particle data

Country Status (1)

Country Link
CN (1) CN113254518A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102609512A (en) * 2012-02-07 2012-07-25 北京中机科海科技发展有限公司 System and method for heterogeneous information mining and visual analysis
CN108021592A (en) * 2016-11-04 2018-05-11 上海大学 A kind of Unstructured Data Management for ARTBEATS DESKTOP TECHNOLOGY NTSC field
CN110415831A (en) * 2019-07-18 2019-11-05 天宜(天津)信息科技有限公司 A kind of medical treatment big data cloud service analysis platform
CN110750686A (en) * 2019-10-12 2020-02-04 河海大学 Fusion system and fusion method of global heterogeneous data
CN110874414A (en) * 2020-01-19 2020-03-10 北京同方软件有限公司 Policy interpretation method based on data joint service
CN111221887A (en) * 2018-11-27 2020-06-02 中云开源数据技术(上海)有限公司 Method for managing and accessing data in data lake server
CN111680041A (en) * 2020-05-31 2020-09-18 西南电子技术研究所(中国电子科技集团公司第十研究所) Safe and efficient access method for heterogeneous data

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102609512A (en) * 2012-02-07 2012-07-25 北京中机科海科技发展有限公司 System and method for heterogeneous information mining and visual analysis
CN108021592A (en) * 2016-11-04 2018-05-11 上海大学 A kind of Unstructured Data Management for ARTBEATS DESKTOP TECHNOLOGY NTSC field
CN111221887A (en) * 2018-11-27 2020-06-02 中云开源数据技术(上海)有限公司 Method for managing and accessing data in data lake server
CN110415831A (en) * 2019-07-18 2019-11-05 天宜(天津)信息科技有限公司 A kind of medical treatment big data cloud service analysis platform
CN110750686A (en) * 2019-10-12 2020-02-04 河海大学 Fusion system and fusion method of global heterogeneous data
CN110874414A (en) * 2020-01-19 2020-03-10 北京同方软件有限公司 Policy interpretation method based on data joint service
CN111680041A (en) * 2020-05-31 2020-09-18 西南电子技术研究所(中国电子科技集团公司第十研究所) Safe and efficient access method for heterogeneous data

Similar Documents

Publication Publication Date Title
Lv et al. Advanced machine learning on cognitive computing for human behavior analysis
Yang et al. Research on enterprise risk knowledge graph based on multi-source data fusion
CN112463981A (en) Enterprise internal operation management risk identification and extraction method and system based on deep learning
Cagliero et al. Improving classification models with taxonomy information
Wang et al. The abnormal traffic detection scheme based on PCA and SSH
Wang et al. Research on anomaly detection and real-time reliability evaluation with the log of cloud platform
Ma et al. Fuzzy knowledge management for the semantic web
Parameswaran et al. Optimizing open-ended crowdsourcing: The next frontier in crowdsourced data management
Leung et al. Health analytics on COVID-19 data with few-shot learning
Thenmozhi et al. An ontological approach to handle multidimensional schema evolution for data warehouse
Roddick et al. Higher order mining
Pavlopoulou et al. IoTSAX: A dynamic abstractive entity summarization approach with approximation and embedding-based reasoning rules in publish/subscribe systems
Sanprasit et al. Intelligent approach to automated star-schema construction using a knowledge base
CN113254518A (en) Information resource management and analysis method based on particle data
Mitov Class association rule mining using multidimensional numbered information spaces
CN114048257A (en) Grain data decomposition method based on service standard driving
Li et al. Research on storage method for fuzzy RDF graph based on Neo4j
Ali et al. An effective classification approach for big data with parallel generalized Hebbian algorithm
Aljibawi et al. A survey on clustering density based data stream algorithms
Wang et al. Matching biomedical ontologies with GCN-based feature propagation
CN114329570A (en) Data encryption analysis method and device
Baraldi et al. An Intrinsically Interpretable Entity Matching System.
Su et al. [Retracted] Design and Application of Intelligent Management Platform Based on Big Data
Sawarkar et al. Automated metadata harmonization using entity resolution and contextual embedding
Yang et al. Graph convolutional networks with dependency parser towards multiview representation learning for sentiment analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination