CN110688421A - Intelligent customizable data management and analysis method - Google Patents

Intelligent customizable data management and analysis method Download PDF

Info

Publication number
CN110688421A
CN110688421A CN201810633877.2A CN201810633877A CN110688421A CN 110688421 A CN110688421 A CN 110688421A CN 201810633877 A CN201810633877 A CN 201810633877A CN 110688421 A CN110688421 A CN 110688421A
Authority
CN
China
Prior art keywords
data
analysis
main
intelligent
customizable
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810633877.2A
Other languages
Chinese (zh)
Inventor
孟涛
李佳静
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Network Sense To Inspect Mdt Infotech Ltd
Original Assignee
Nanjing Network Sense To Inspect Mdt Infotech Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Network Sense To Inspect Mdt Infotech Ltd filed Critical Nanjing Network Sense To Inspect Mdt Infotech Ltd
Priority to CN201810633877.2A priority Critical patent/CN110688421A/en
Publication of CN110688421A publication Critical patent/CN110688421A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention specifically relates to an intelligent customizable data governance and analysis method, which comprises the following steps: step 1: and constructing global main data. Step 2: unstructured data within the application system is structured. And step 3: and fusing data from a plurality of different sources in the application system based on the main data to obtain standard data. And 4, step 4: fields are customized in the standard data as types and tags for the data. And 5: analysis conditions, analysis ranges, and chart formats are customized. The method provided by the invention can be used for intelligently managing the application system data, and comprises the steps of structuring the unstructured application system data, and aligning the data and supplementing the missing data aiming at the multi-source heterogeneous data. Meanwhile, the method supports the user to customize the analysis conditions, defines the analysis range, customizes the data display form, and flexibly realizes the customizable data analysis.

Description

Intelligent customizable data management and analysis method
Technical Field
The invention relates to the field of information extraction and text analysis, in particular to an intelligent customizable data governance and analysis method.
Background
The main data is data used for describing core business entities of the enterprise, such as clients, partners, employees, products, material sheets, accounts and the like; it is data that has high business value, can be reused across various business sectors within an enterprise, and exists in multiple heterogeneous application systems. The main data has no uniform standard, no clear definition and no scope in definition; in the aspect of flow, management flows such as data creation, maintenance and the like are inconsistent; for the quality problem, the data is lack of integrity, consistency and accuracy, and the repeated data is more, so that the main data is difficult to manage; the problems of the unknown who is the main data, the poor sharing way, the difficult access control and the like also cause the difficulty in the main data sharing.
In the multi-source heterogeneous data, due to the fact that alias names, short names, translations, natural expressions and written languages are different, the same concept can have different names, and data alignment needs to be carried out. In addition, there is a problem of data loss, and padding is required. In addition, there are a lot of unstructured data in the application system, such as cases, decision books, and documents, and data analysis cannot be directly performed. These all need to have intelligent data governance methods to solve.
In addition, most of the analysis tools currently give fixed results for given data, and flexible and customizable data analysis is difficult to achieve. Such as conditions that do not support the user to customize the analysis, defining the scope of the analysis, and customizing the presentation form of the data.
Disclosure of Invention
1. The technical problem to be solved is as follows:
aiming at the problems, the invention provides an intelligent customizable data management and analysis method. The method comprises the steps of firstly, constructing global main data, extracting information aiming at unstructured data to enable the unstructured data to be structured, and then finishing data management based on the global main data to obtain standard data; the user can customize the fields in the standard data to classify or label; and finally, displaying according to the analysis condition, the analysis range and the display mode defined by the user.
2. The technical scheme is as follows:
an intelligent customizable data governance and analysis method, characterized by: the method comprises the following steps:
step 1: and constructing global main data.
Step 2: unstructured data within the application system is structured.
And step 3: and fusing data from a plurality of different sources in the application system based on the main data to obtain standard data.
And 4, step 4: fields are customized in the standard data as types and tags for the data.
And 5: analysis conditions, analysis ranges, and chart formats are customized.
Step 6: and generating a data analysis result according to the customization condition of the step 5.
Further, the specific process of establishing the global master data in step 1 is as follows: the method comprises the steps that a database of an application system and website data of related fields are used as main data sources of main data; designing a series of conversion rules, and obtaining main data from an application system database and website data in related fields; the transformation rules include, but are not limited to: converting the table name in the relational mode into a concept name in the main data, converting the relationship between the table and the table into the relationship between the concept and the concept in the main data, and converting the field name in the relational mode into an attribute name of the main data; the method of obtaining master data also includes a manually defined manner.
Further, the method for performing structured processing on the unstructured data in the application system in the step 2 is a method adopting information extraction; wherein the extracted information is the main data but is not limited to the main data.
Further, the step 3 is a method for fusing data from a plurality of different sources based on main data, and the method includes data alignment and missing data completion.
The data alignment is to perform knowledge fusion on main data of a plurality of heterogeneous data sources; detecting main data in different fields by adopting a similarity detection rule for the existence of the same or similar concepts and attributes; the similarity detection rule comprises semantic similarity detection, concept similarity detection, attribute similarity detection and data format similarity detection; after the similarity test is carried out, the same and similar main data in a plurality of heterogeneous data sources can be unified.
The missing data completion is divided into external missing data and internal missing data; for external missing data, acquiring data of an external website through a webpage crawler technology; for internal missing data, completing by using an association rule mining method; the internal rule relation hidden among the attributes in the data set can be found out through association rule mining, and the unknown attribute value can be deduced by using the existing conditional attribute value through the rule, so that the effect of filling the data set is achieved.
Data subjected to data alignment and missing data completion become standard data, and statistics and analysis can be performed on the basis.
Further, in step 4, a field is customized in the standard data, and a method for using the field as a type and a tag of the data is as follows: after the user has customized a new field, the method of generating the data class and label is either a rule-based method or a machine learning-based method.
Further, the customizing of the analysis condition, the analysis range and the report format in the step 5 specifically includes:
conditions for custom analysis: specifying which fields or attributes to extract from.
Scope of custom analysis: for a given field, a value range can be set, and only data within the value range is extracted.
Customizing a presentation form of the data, the presentation form including a list, a pie chart, a trend chart, a histogram, and a relationship chart.
Further, the method for generating the data analysis result in step 6 is as follows:
and automatically generating corresponding SQL sentences according to a report form format customized by a user, inquiring the database, generating a corresponding form of an inquiry result, such as a trend graph, and displaying the inquiry result to the user.
3. Has the advantages that:
the method provided by the invention can be used for intelligently managing the application system data, and comprises the steps of structuring the unstructured application system data, and aligning the data and supplementing the missing data aiming at the multi-source heterogeneous data. Meanwhile, the method supports the user to customize the analysis conditions, defines the analysis range, customizes the data display form, and flexibly realizes the customizable data analysis.
Drawings
FIG. 1 is a flow diagram of an intelligent, customizable data governance and analysis method.
Detailed Description
The invention will be further described with reference to the accompanying drawings.
Fig. 1 shows an intelligent customizable data governance and analysis method, which is characterized in that: the method comprises the following steps:
step 1: and constructing global main data.
Step 2: unstructured data within the application system is structured.
And step 3: and fusing data from a plurality of different sources in the application system based on the main data to obtain standard data.
And 4, step 4: fields are customized in the standard data as types and tags for the data.
And 5: analysis conditions, analysis ranges, and chart formats are customized.
Step 6: and generating a data analysis result according to the customization condition of the step 5.
The specific process of establishing the global master data in the step 1 is as follows: the method comprises the steps that a database of an application system and website data of related fields are used as main data sources of main data; designing a series of conversion rules, and obtaining main data from an application system database and website data in related fields; the transformation rules include, but are not limited to: the table name in the relational schema is converted into the concept name in the main data, the relationship between the table and the table is converted into the relationship between the concept and the concept in the main data, and the field name in the relational schema is converted into the attribute name of the main data. For example, in a hospital information system, the database tables include patient records, bed records, and patient sign records. Establishing patient main data according to the patient record, wherein fields are used as attribute names and comprise a patient identification number, a patient name, a bed number, admission date, main diagnosis, an illness state and the like; there is a "manifestation" relationship between the patient and the sign record, i.e., "patient presents signs," which translates into an inter-conceptual relationship.
Relational databases may have complete data schemas, including complete table structures and integrity constraints. Thus, the relationship names in the database can be converted into concepts in the main data, and the partial field names can be converted into attributes in the main data. The main data can also be obtained in a manually defined manner.
In the step 2, the method for carrying out the structuralization processing on the unstructured data in the application system is a method adopting information extraction; wherein the extracted information is the main data but is not limited to the main data. Unstructured business data, such as cases, documents, official documents, and the like, are converted into structured data by an information extraction method. The extracted information may be, but is not limited to, main data. For example, the case includes main data such as "patient", "symptom", "test result", "treatment method", and medicine, and the values of these main data are extracted from the case and converted into structured data.
And 3, the method for fusing the data of a plurality of different sources based on the main data comprises data alignment and missing data completion.
Data alignment: among the multiple sources of heterogeneous data, the same concept may have different names due to differences in aliases (e.g., acronyms), acronyms, translations, natural expressions, and written languages. For example, "NS" is a abbreviation of "physiological saline," Hospital 301 "is an alias of" general Hospital of the Chinese people's liberation military, "and the common English expressions for" Indications "include" Indications "," Indications and Uses "," major (principal) Indications "," Uses "," actions and use ".
Knowledge fusion of the main data of multiple heterogeneous data sources is therefore required. The data alignment is to perform knowledge fusion on the main data of a plurality of heterogeneous data sources; detecting main data in different fields by adopting a similarity detection rule for the existence of the same or similar concepts and attributes; the similarity detection rule comprises semantic similarity detection, concept similarity detection, attribute similarity detection and data format similarity detection; after the similarity test is carried out, the same and similar main data in a plurality of heterogeneous data sources can be unified.
Each explicitly defined synonym may be found, for example, based on a word vector; or identifying concepts that are synonyms with multiple instances of a concept as new concepts of the same type. One may choose to use google's word2vec model to train learning synonyms and related words. The method comprises the steps of data processing, model training and parameter adjustment.
And (3) data loss completion: to augment and refine the master data, non-relational data needs to be collected and populated. The missing data completion is divided into external missing data and internal missing data. And for external missing data, acquiring data of an external website by using a webpage crawler technology. For example, for sales data of the 'Hanyang district', Baidu encyclopedia websites are acquired through a webpage crawler technology and supplemented into the 'Hanyang district in Wuhan City of Hubei province'.
For internal missing data, completing by using an association rule mining method; the internal rule relation hidden among the attributes in the data set can be found out through association rule mining, and the unknown attribute value can be deduced by using the existing conditional attribute value through the rule, so that the effect of filling the data set is achieved. For example, the age of the user is missing, the year and month of birth can be obtained from the identification number and then filled in.
Data subjected to data alignment and missing data completion become standard data, and statistics and analysis can be performed on the basis.
In the step 4, fields are customized in the standard data, and the method for using the fields as the types and the labels of the data is as follows:
after the user has customized a new field, the method of generating the data class and label is either a rule-based method or a machine learning-based method. For example, "systolic pressure >140, diastolic pressure >90 is hypertension; otherwise it is normal. The blood pressure of the patient may be classified according to the data of systolic and diastolic blood pressure according to the above rules, resulting in a class label. Method based on machine learning: and taking a part of marked data as a training set to train the machine learning method. The newly entered data may then be automatically classified to produce class labels. For example, for the staging of heart failure in a patient, clinical diagnosis of heart failure is difficult due to the lack of a simple and effective model. This situation can be achieved by using machine learning methods to train, for example, an SVM model to a heart failure diagnosis and staging model based on previously standardized cases. For a new case, its staging label can then be automatically generated.
The customizing of the analysis conditions, the analysis range and the report format in the step 5 specifically comprises the following steps:
conditions for custom analysis: specifying which fields or attributes to extract from.
Scope of custom analysis: for a given field, a value range can be set, and only data within the value range is extracted.
Customizing a presentation form of the data, the presentation form including a list, a pie chart, a trend chart, a histogram, and a relationship chart.
For example, the user establishes a target place as "wuhan", an organization as "kindergarten", a disease name as "hand-foot-mouth", a time range as "nearly three months", and a result format as "trend graph".
The method for generating the data analysis result in the step 6 comprises the following steps: and automatically generating corresponding SQL sentences according to a report form format customized by a user, inquiring the database, generating a corresponding form of an inquiry result, such as a trend graph, and displaying the inquiry result to the user.

Claims (7)

1. An intelligent customizable data governance and analysis method, characterized by: the method comprises the following steps:
step 1: constructing global main data;
step 2: carrying out structuring processing on unstructured data in an application system;
and step 3: fusing data from a plurality of different sources in an application system based on main data to obtain standard data;
and 4, step 4: customizing fields in the standard data as types and labels of the data;
and 5: customizing analysis conditions, analysis ranges and chart formats;
step 6: and generating a data analysis result according to the customization condition of the step 5.
2. The intelligent customizable data governance and analysis method of claim 1, wherein: the specific process of establishing the global master data in the step 1 is as follows: the method comprises the steps that a database of an application system and website data of related fields are used as main data sources of main data; designing a series of conversion rules, and obtaining main data from an application system database and website data in related fields; the transformation rules include, but are not limited to: converting the table name in the relational mode into a concept name in the main data, converting the relationship between the table and the table into the relationship between the concept and the concept in the main data, and converting the field name in the relational mode into an attribute name of the main data; the method of obtaining master data also includes a manually defined manner.
3. The intelligent customizable data governance and analysis method of claim 1, wherein: in the step 2, the method for carrying out the structuralization processing on the unstructured data in the application system is a method adopting information extraction; wherein the extracted information is the main data but is not limited to the main data.
4. The intelligent customizable data governance and analysis method of claim 1, wherein: the step 3 is a method for fusing data from a plurality of different sources based on main data, and the method comprises data alignment and missing data completion;
the data alignment is to perform knowledge fusion on main data of a plurality of heterogeneous data sources; detecting main data in different fields by adopting a similarity detection rule for the existence of the same or similar concepts and attributes; the similarity detection rule comprises semantic similarity detection, concept similarity detection, attribute similarity detection and data format similarity detection; after the similarity test is carried out, the same and similar main data in a plurality of heterogeneous data sources can be unified;
the missing data completion is divided into external missing data and internal missing data; for external missing data, acquiring data of an external website through a webpage crawler technology; for internal missing data, completing by using an association rule mining method; the internal rule relation hidden among the attributes in the data set can be found out through association rule mining, and the unknown attribute value can be deduced by using the existing conditional attribute value by using the rule, so that the effect of filling the data set is achieved;
data subjected to data alignment and missing data completion become standard data, and statistics and analysis can be performed on the basis.
5. The intelligent customizable data governance and analysis method of claim 1, wherein: in the step 4, fields are customized in the standard data, and the method for using the fields as the types and the labels of the data is as follows:
after the user has customized a new field, the method of generating the data class and label is either a rule-based method or a machine learning-based method.
6. The intelligent customizable data governance and analysis method of claim 1, wherein: the customizing of the analysis conditions, the analysis range and the report format in the step 5 specifically comprises the following steps:
conditions for custom analysis: specifying from which fields or attributes to extract;
scope of custom analysis: for the appointed field, a value range can be set, and only the data in the value range is extracted;
customizing a presentation form of the data, the presentation form including a list, a pie chart, a trend chart, a histogram, and a relationship chart.
7. The intelligent customizable data governance and analysis method of claim 1, wherein: the method for generating the data analysis result in the step 6 comprises the following steps: and automatically generating corresponding SQL sentences according to a report form format customized by a user, inquiring the database, generating a corresponding form of an inquiry result, such as a trend graph, and displaying the inquiry result to the user.
CN201810633877.2A 2018-06-20 2018-06-20 Intelligent customizable data management and analysis method Pending CN110688421A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810633877.2A CN110688421A (en) 2018-06-20 2018-06-20 Intelligent customizable data management and analysis method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810633877.2A CN110688421A (en) 2018-06-20 2018-06-20 Intelligent customizable data management and analysis method

Publications (1)

Publication Number Publication Date
CN110688421A true CN110688421A (en) 2020-01-14

Family

ID=69106210

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810633877.2A Pending CN110688421A (en) 2018-06-20 2018-06-20 Intelligent customizable data management and analysis method

Country Status (1)

Country Link
CN (1) CN110688421A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112347112A (en) * 2020-09-16 2021-02-09 北京中兵数字科技集团有限公司 Aviation data management method, aviation data management device and storage medium
CN112365939A (en) * 2020-10-14 2021-02-12 山东大学 Data management method and system based on medical health big data
CN114897516A (en) * 2022-07-12 2022-08-12 山东乐习信息科技有限公司 Method for overall process data management of special equipment
CN116226786A (en) * 2023-03-22 2023-06-06 中国人民解放军军事科学院系统工程研究院 Data processing method and device for information system data fusion

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1757375A (en) * 2005-10-18 2006-04-12 浙江大学 Lamination estimate method for cardiovascular danger of hyperpietic based artificial nervous network
CN106354786A (en) * 2016-08-23 2017-01-25 冯村 Visual analysis method and system
CN107357933A (en) * 2017-08-04 2017-11-17 刘应波 A kind of label for multi-source heterogeneous science and technology information resource describes method and apparatus
CN108010573A (en) * 2017-11-24 2018-05-08 苏州市环亚数据技术有限公司 A kind of hospital data emerging system, method, electronic equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1757375A (en) * 2005-10-18 2006-04-12 浙江大学 Lamination estimate method for cardiovascular danger of hyperpietic based artificial nervous network
CN106354786A (en) * 2016-08-23 2017-01-25 冯村 Visual analysis method and system
CN107357933A (en) * 2017-08-04 2017-11-17 刘应波 A kind of label for multi-source heterogeneous science and technology information resource describes method and apparatus
CN108010573A (en) * 2017-11-24 2018-05-08 苏州市环亚数据技术有限公司 A kind of hospital data emerging system, method, electronic equipment and storage medium

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112347112A (en) * 2020-09-16 2021-02-09 北京中兵数字科技集团有限公司 Aviation data management method, aviation data management device and storage medium
CN112347112B (en) * 2020-09-16 2022-03-15 北京中兵数字科技集团有限公司 Aviation data management method, aviation data management device and storage medium
CN112365939A (en) * 2020-10-14 2021-02-12 山东大学 Data management method and system based on medical health big data
CN114897516A (en) * 2022-07-12 2022-08-12 山东乐习信息科技有限公司 Method for overall process data management of special equipment
CN116226786A (en) * 2023-03-22 2023-06-06 中国人民解放军军事科学院系统工程研究院 Data processing method and device for information system data fusion
CN116226786B (en) * 2023-03-22 2023-08-22 中国人民解放军军事科学院系统工程研究院 Data processing method and device for information system data fusion

Similar Documents

Publication Publication Date Title
Berman Principles of big data: preparing, sharing, and analyzing complex information
US7555425B2 (en) System and method of improved recording of medical transactions
Mate et al. Ontology-based data integration between clinical and research systems
De Mul et al. Development of a clinical data warehouse from an intensive care clinical information system
Lee et al. A survey of SNOMED CT implementations
US8949108B2 (en) Document processing, template generation and concept library generation method and apparatus
CN110688421A (en) Intelligent customizable data management and analysis method
Matney et al. Development of the nursing problem list subset of SNOMED CT®
Ulrich et al. Metadata repository for improved data sharing and reuse based on HL7 FHIR.
AU2002332971A1 (en) System and method of improved recording of medical transactions
Rubin et al. A data warehouse for integrating radiologic and pathologic data
Pearce et al. Coding and classifying GP data: the POLAR project
US20170364640A1 (en) Machine learning algorithm to automate healthcare communications using nlg
Wulff et al. Designing an openEHR-based pipeline for extracting and standardizing unstructured clinical data using natural language processing
Lin et al. An exploratory study using an openEHR 2-level modeling approach to represent common data elements
Maldonado et al. Concept-based exchange of healthcare information: The LinkEHR approach
Papež et al. applying an archetype-Based approach to electroencephalography/event-related Potential experiments in the eegBase resource
US20220293253A1 (en) Systems and methods using natural language processing to improve computer-assisted coding
Austin et al. Evaluation of ISO EN 13606 as a result of its implementation in XML
Huang et al. Generating standardized clinical documents for medical information exchanges
CN113096795B (en) Multi-source data-aided clinical decision support system and method
Kim et al. A clinical document architecture (CDA) to generate clinical documents within a hospital information system for e-healthcare services
Park et al. CDISC Transformer: a metadata-based transformation tool for clinical trial and research data into CDISC standards
Santillan et al. Development and Utility of a Novel Intergenerational Health Knowledgebase
CN112786132B (en) Medical record text data segmentation method and device, readable storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination