CN112768059A - Method for standardizing grade data in medical data - Google Patents

Method for standardizing grade data in medical data Download PDF

Info

Publication number
CN112768059A
CN112768059A CN202110097944.5A CN202110097944A CN112768059A CN 112768059 A CN112768059 A CN 112768059A CN 202110097944 A CN202110097944 A CN 202110097944A CN 112768059 A CN112768059 A CN 112768059A
Authority
CN
China
Prior art keywords
data
grade
standard
column
mapping rule
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110097944.5A
Other languages
Chinese (zh)
Other versions
CN112768059B (en
Inventor
李红良
秦娟娟
张晓晶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University WHU
Original Assignee
Wuhan University WHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University WHU filed Critical Wuhan University WHU
Priority to CN202110097944.5A priority Critical patent/CN112768059B/en
Publication of CN112768059A publication Critical patent/CN112768059A/en
Application granted granted Critical
Publication of CN112768059B publication Critical patent/CN112768059B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Public Health (AREA)
  • Data Mining & Analysis (AREA)
  • Biomedical Technology (AREA)
  • Databases & Information Systems (AREA)
  • Pathology (AREA)
  • Epidemiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The invention discloses a method for standardizing grade data in medical data, which comprises the following steps: acquiring original physical examination data columns from different data source units, carrying out column name standardization processing through a standard glossary and determining a grading rule; the data are classified into two types according to whether the data are pure numerical data: automatically converting the data content in the data column, which belongs to the pure numerical form, into a corresponding hierarchical form according to the index reference range, namely replacing the data content by the A-type mapping rule; replacing the data content in the data column in the non-pure numerical form with a corresponding hierarchical form through a standard mapping library, namely replacing the data content with a B-type mapping rule; merging A, B types of rule cleaning results to generate statistics of frequency results of cleaning the grade data, and performing quality control on the grade cleaning results; and correcting the conflict item after the result is combined. The invention can control the level data contents with different forms into regular level forms, thereby facilitating subsequent mining and analysis.

Description

Method for standardizing grade data in medical data
Technical Field
The invention relates to the technical field of medical big data, in particular to a method for standardizing grade data in medical data.
Background
In recent years, China has gained rapid development in the field of big data science. However, many technical bottlenecks still exist in the field of medical health big data. One of the problems to be solved urgently is how to effectively manage massive health data so as to mine useful information to benefit human health. Physical examination data is a very important source of medical health data, and the covered population is very wide. The health examination data is effectively treated and mined, and very important scientific reference is provided for the fields of chronic disease prevention and control and the like in China.
The physical examination data mainly comprises three data material types, namely text type data, metering type data and level type data. The grade data refers to data with certain grade, such as clinical curative effect divided into cure, effect, improvement and ineffectiveness, clinical test result divided into-, + + + + + + +, and severity of symptoms such as pain divided into 0 (no pain), 1 (mild), 2 (moderate) and 3 (severe). The hierarchical data is very cluttered due to different standards and description modes of different units. For example, the same level type indicators may be recorded as "-, ±, + + + + +; negative, weak positive, strong positive; the morphologies of 0.00(-), 10 (Weak Yang), 500(+), >10000 "and the like are different, so that the data are difficult to be converted into valuable information through analysis. The present invention can solve the above problems well.
Disclosure of Invention
The technical problem to be solved by the present invention is to provide a method for standardizing grade data in medical data, aiming at the defects in the prior art.
The technical scheme adopted by the invention for solving the technical problems is as follows:
the invention provides a method for standardizing grade data in medical data, which comprises the following steps:
step 1: acquiring original physical examination data columns from different data source units, and performing column name standardization processing through a standard glossary to obtain a standardized grade data column;
step 2: determining a grade data column to be cleaned and a grading rule thereof;
and step 3: the method comprises the following steps of (1) dividing the data in a level data column into two types according to the fact whether the data are pure numerical data or not, cleaning the pure numerical data according to an A type mapping rule, and cleaning the non-pure numerical data according to a B type mapping rule;
and 4, step 4: class a mapping rule: automatically converting the data content in the data column, which belongs to the pure numerical form, into a corresponding hierarchical form according to the index reference range;
and 5: class B mapping rules: replacing the data content in the data column which belongs to the non-pure numerical form with a corresponding hierarchical form through a standard database;
step 6, after cleaning through A, B mapping rules, combining cleaning results, performing frequency statistics of grade forms, and performing quality control on the cleaning results;
and 7, merging the graded replacement results, correcting the conflict item after merging the results, and outputting corrected standardized data.
Further, the specific method of the name normalization processing in step 1 of the present invention is:
the row name standardization matches each data row with a corresponding standard term, and the data type of the standard term comprises a text data standard term, a metering data standard term and a grade data standard term.
Further, the specific method of step 2 of the present invention is:
the data column standardized into the grade data terms enters a grade data cleaning process, the standard terminology table sets the grading standard corresponding to each grade data term, and the grading standard of the standard terminology expresses the content of the grade data through numbers, so that the grading data in various forms can be subjected to standardized treatment by using a set of uniform digital standards.
Further, the specific method of step 4 of the present invention is:
the A-type mapping rule automatically converts the normal reference range [ a, b ] of the index given by the data source unit into a uniform interval form through an algorithm: graded form 1 (-infinity, a) | | | graded form 2 [ a, b ] | | | | graded form 3 (b, + ∞); based on the A-type mapping rule, the pure numerical morphological content in the grade data column is subjected to grade replacement through an A-type mapping rule algorithm.
Further, the specific method of step 5 of the present invention is:
the B-type mapping rule is a professional database which is made according to the national clinical examination guideline, and the basic structure of the B-type mapping rule is a standard term name-hierarchical rule-original form-corresponding hierarchical replacement form; based on the B-type mapping rule, the non-pure numerical content is subjected to level replacement through a B-type mapping rule algorithm.
Further, the specific method of step 6 of the present invention is:
and (3) carrying out statistics on the grade form frequency of each data column under each standard term through an algorithm to generate a grade form frequency statistical table, wherein the form of the statistical table is as follows: standard term name-data source unit/data column-level morphology frequency-level morphology percentage. And the quality control of the grade cleaning result is realized by observing whether the grade form distribution proportion of each data column under the same standard term is abnormal.
Further, the specific method of step 7 of the present invention is:
merging all data columns under the same standard term, marking the different grade forms corresponding to the same patient under two or more same standard terms as merging conflicts, and finally selecting the only and correct grade form from the merging conflicts.
The invention has the following beneficial effects: the method for standardizing the grade data in the medical data provided by the invention is used for standardizing the grade physical examination data to finally obtain orderly and uniform digital examination results, thereby greatly improving the orderliness and the mining property of the grade data physical examination data.
Drawings
The invention will be further described with reference to the accompanying drawings and examples, in which:
FIG. 1 is a flow chart of level data cleaning according to an embodiment of the present invention;
fig. 2 is a flow chart of an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
As shown in fig. 1, the data content in the pure numerical form in the hierarchical data column is automatically converted into the corresponding hierarchical form (class a mapping rule replacement) through the index reference range, the data content in the non-pure numerical form in the data column is replaced with the corresponding hierarchical form (class B mapping rule replacement) through the standard mapping library, the hierarchical replacement results are merged, and the conflict item correction after the quality control and the merged results is performed.
In the example shown in fig. 2, a list of graded data, with the standard term stool analysis-red blood cells, the original morphology includes: -, + + + + + + + + + + + + + +, negative, weak positive, strong positive, 0, 2, 4, 12 and 18. Through the grade cleaning process, the original form can be finally replaced by the digital standard grade form.
The method comprises the following steps:
step 1: and carrying out column name standardization processing on the original data column through a standard terminology table. Column name normalization matches each data column with a most appropriate standard term. The data type to which the standard terms belong includes text data standard terms, measurement data standard terms and grade data standard terms.
Step 2: the standard terminology table also defines the grading standard corresponding to each grade data terminology. The grading standard of the standard terms expresses the contents of the grading data by numbers, so that the grading data of various forms can be standardized and treated by a set of unified digital standards. The standard terminology for this column of data is: stool analysis-erythrocytes, as a term of rank data, are ranked according to the criteria: 1: negative (-), 2: weak positive (±), 3: positive (+), 4: strong positive (++), 5: strong positive (+++), 6 strong positive (+++).
And step 3: the data row is divided into two categories according to whether the data row is pure numerical data, namely (1) pure numerical data: 0. 2, 4, 12 and 18, and performing level replacement on the part of the content by going through a class A rule; (2) non-pure numerical type: -, + + + + + + + + + + + + + +, negative, weak positive, strong positive, and the contents of the part go through B-type rules to perform grade replacement. The contents in the pure numerical form and the contents in the non-pure numerical form have respective characteristics, and the contents are preferably cleaned according to different cleaning rules so as to improve the cleaning efficiency and accuracy.
And 4, step 4: the computer program generates a class a mapping rule based on the corresponding index reference value range given by the data source unit (usually the hospital examination center). The A-type mapping rule is to convert the normal reference range of the index given by the data source unit into a uniform interval form, and then to replace the pure numerical form content in the grade data column by the A-type mapping rule through a language recognizable by a computer. If the data source unit gives the reference range for the data column: -: 0 to 3; 3-5 parts of +/-0; 5-10 parts of; 10-15 parts of ++; 15-20 parts of ++; 20-infinity, then the mapping rule of A in the automatically generated A mapping table of computer will be recorded as 1: [0, 3); 2, 3, 5); 3,5, 10); 4- [10,15 ]; 5, 15, 20); 6 [20, + ∞). By the class a mapping rule, the 0, 2 level can be replaced by 1, the 4 level can be replaced by 2, the 12 level can be replaced by 4, and the 18 level can be replaced by 5.
And 5: and replacing the contents in the non-pure numerical value form in the grade data row by performing a B-type mapping rule. The B-type mapping rule is a professional database prepared according to national clinical laboratory guidelines, and the basic structure of the B-type mapping rule is standard term name-hierarchical rule-original form-hierarchical alternative form. For example, for the standard term of stool analysis-red blood cells, the classification standard and the corresponding rule of the original form and the corresponding classification replacement form are noted in the mapping rule of class B, and forms such as "-", "negative", "(-) -and the like correspond to the classification form" 1 "; for example, the forms "+ -," weakly positive "(+ -.)" and the like are classified into the form "2", and so on. According to the B-type mapping table, the program can identify the original form in the data to be cleaned and convert the original form into a corresponding hierarchical form. In this example, the B-type mapping rule table can replace the negative level with 1, the positive level with 2, the negative level with 3, the positive level with 4, the strong positive level with 5, the strong positive level with 6.
TABLE 1 class B mapping table
Figure BDA0002915097680000051
Step 6: after A, B mapping replacement is completed, the program will merge the data after A, B cleaning rule level replacement, and perform level form frequency statistics of each data row under each standard term to generate a level form frequency statistics table. And the quality control of the grade replacement result can be realized by observing whether the grade form distribution proportion of each data column under the same standard term is abnormal or not through the grade form frequency statistical table.
TABLE 2 frequency statistics table for grade morphology
Figure BDA0002915097680000061
And 7: there may be multiple data columns under the same standard terminology (stool analysis-red blood cells), and data columns normalized to the same standard terminology are merged. If the level morphology is inconsistent under the same standard term of the same patient ID after combination, the level morphology is marked as a combination conflict, and finally, the only and correct level morphology is selected from the combination conflict.
It will be understood that modifications and variations can be made by persons skilled in the art in light of the above teachings and all such modifications and variations are intended to be included within the scope of the invention as defined in the appended claims.

Claims (7)

1. A method for normalizing grade data in medical data, the method comprising the steps of:
step 1: acquiring original physical examination data columns from different data source units, and performing column name standardization processing through a standard glossary to obtain a standardized grade data column;
step 2: determining a grade data column to be cleaned and a grading rule thereof;
and step 3: the method comprises the following steps of (1) dividing the data in a level data column into two types according to the fact whether the data are pure numerical data or not, cleaning the pure numerical data according to an A type mapping rule, and cleaning the non-pure numerical data according to a B type mapping rule;
and 4, step 4: class a mapping rule: automatically converting the data content in the data column, which belongs to the pure numerical form, into a corresponding hierarchical form according to the index reference range;
and 5: class B mapping rules: replacing the data content in the data column which belongs to the non-pure numerical form with a corresponding hierarchical form through a standard mapping library;
step 6, after cleaning through A, B mapping rules, combining cleaning results to generate a grade form frequency table, and performing quality control on the cleaning results;
and 7, merging the graded replacement results, correcting the conflict item after merging the results, and outputting corrected standardized data.
2. The method for standardizing the grading data in the medical data as claimed in claim 1, wherein the step 1 is characterized in that the concrete method of standardization processing of the list names comprises the following steps:
the row name standardization matches each data row with a corresponding standard term, and the data type of the standard term comprises a text data standard term, a metering data standard term and a grade data standard term.
3. The method for normalizing grade data in medical data according to claim 1, wherein the specific method of step 2 is as follows:
the data column standardized into the grade data terms enters a grade data cleaning process, the standard terminology table sets the grading standard corresponding to each grade data term, and the grading standard of the standard terminology expresses the content of the grade data through numbers, so that the grading data in various forms can be subjected to standardized treatment by using a set of uniform digital standards.
4. The method for normalizing grade data in medical data according to claim 1, wherein the specific method of step 4 is as follows:
the A-type mapping rule automatically converts the normal reference range [ a, b ] of the index given by the data source unit into a uniform interval form through an algorithm: graded form 1 (-infinity, a) | | | graded form 2 [ a, b ] | | | | graded form 3 (b, + ∞); based on the A-type mapping rule, the pure numerical morphological content in the grade data column is subjected to grade replacement through an A-type mapping rule algorithm.
5. The method for normalizing grade data in medical data according to claim 1, wherein the specific method of step 5 is as follows:
the B-type mapping rule is a professional database which is made according to the national clinical examination guideline, and the basic structure of the B-type mapping rule is a standard term name-hierarchical rule-original form-corresponding hierarchical replacement form; based on the B-type mapping rule, the non-pure numerical content is subjected to level replacement through a B-type mapping rule algorithm.
6. The method for normalizing grade data in medical data according to claim 1, wherein the specific method of step 6 is as follows:
and (3) carrying out statistics on the level form frequency of each data column under each standard term through an algorithm to generate a level data form frequency statistical table, wherein the form of the statistical table is as follows: standard term name-data source unit/data column-grade morphology frequency-grade morphology percentage; and judging whether the grade form distribution proportion of each data column under the same standard term is abnormal or not by an algorithm based on the grading standard and manual labeling of the standard term table to realize quality control on the grade cleaning result.
7. The method for normalizing grade data in medical data according to claim 1, wherein the step 7 is performed by:
combining all data columns under the same standard term through an algorithm, marking different grade morphologies corresponding to the same patient under two or more same standard terms as combination conflicts, and finally selecting the only correct grade morphology from the combination conflicts.
CN202110097944.5A 2021-01-25 2021-01-25 Method for standardizing grade data in medical data Active CN112768059B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110097944.5A CN112768059B (en) 2021-01-25 2021-01-25 Method for standardizing grade data in medical data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110097944.5A CN112768059B (en) 2021-01-25 2021-01-25 Method for standardizing grade data in medical data

Publications (2)

Publication Number Publication Date
CN112768059A true CN112768059A (en) 2021-05-07
CN112768059B CN112768059B (en) 2022-09-09

Family

ID=75707141

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110097944.5A Active CN112768059B (en) 2021-01-25 2021-01-25 Method for standardizing grade data in medical data

Country Status (1)

Country Link
CN (1) CN112768059B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115017221A (en) * 2022-07-19 2022-09-06 深圳市指南针医疗科技有限公司 Method, device and equipment for improving AI data cloud quality measurement and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107833595A (en) * 2017-10-12 2018-03-23 山东大学 Medical big data multicenter integration platform and method
CN107993693A (en) * 2017-12-11 2018-05-04 泰康保险集团股份有限公司 Physical examination data managing method, system, storage medium and electronic equipment
US20180144081A1 (en) * 2016-11-23 2018-05-24 Techinsights Inc. Integrated circuit imaging, rendering and layout editing system and method
US20190069869A1 (en) * 2017-09-01 2019-03-07 Siemens Healthcare Gmbh Method and control facility for controlling a medical imaging system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180144081A1 (en) * 2016-11-23 2018-05-24 Techinsights Inc. Integrated circuit imaging, rendering and layout editing system and method
US20190069869A1 (en) * 2017-09-01 2019-03-07 Siemens Healthcare Gmbh Method and control facility for controlling a medical imaging system
CN107833595A (en) * 2017-10-12 2018-03-23 山东大学 Medical big data multicenter integration platform and method
CN107993693A (en) * 2017-12-11 2018-05-04 泰康保险集团股份有限公司 Physical examination data managing method, system, storage medium and electronic equipment

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
姜树强等: "综合医院健康体检大数据标准化构建与应用", 《空军医学杂志》 *
王新国等: "数字化体检系统的设计与实现", 《中国医疗设备》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115017221A (en) * 2022-07-19 2022-09-06 深圳市指南针医疗科技有限公司 Method, device and equipment for improving AI data cloud quality measurement and storage medium

Also Published As

Publication number Publication date
CN112768059B (en) 2022-09-09

Similar Documents

Publication Publication Date Title
CN108831559B (en) Chinese electronic medical record text analysis method and system
Mirhaghi et al. The reliability of the Canadian Triage and Acuity Scale: meta-analysis
Bray et al. Evaluation of data quality in the cancer registry: principles and methods. Part I: comparability, validity and timeliness
Young et al. The endogenous sub-type of depression: a study of its internal construct validity
Arts et al. Quality of data collected for severity of illness scores in the Dutch National Intensive Care Evaluation (NICE) registry
KR101224135B1 (en) Significance parameter extraction method and its clinical decision support system for differential diagnosis of abdominal diseases based on entropy and rough approximation technology
CN112768059B (en) Method for standardizing grade data in medical data
CN111833296B (en) Automatic detection and verification system and method for bone marrow cell morphology
CN115185936B (en) Medical clinical data quality analysis system based on big data
CN112768058B (en) Method and device for processing medical data of metering information type
CN106951710B (en) CAP data system and method based on privilege information learning support vector machine
Piggott et al. Has CONSORT improved the reporting of randomized controlled trials in the palliative care literature? A systematic review
CN112270988A (en) Method and system for auxiliary diagnosis of rare diseases
Zhang et al. Identifying modifiable factors and their joint effect on dementia risk in the UK Biobank
Zhang et al. Traditional Chinese medicine constitution correlated with ischemic stroke: a systematic review and meta-analysis
CN111696659A (en) Medical insurance big data-based tumor morbidity information monitoring method and device
CN111968747B (en) VTE intelligent control management system
CN116936082A (en) Quantitative assessment method, system and device for physical health risk
CN109359838A (en) A kind of monitoring Evaluation of Medical Quality system using HACs algorithm
CN115414043A (en) System, device and storage medium for identifying depressive disorder based on electroencephalogram signal analysis
Yang et al. Is the health workforce distribution in Beijing, China perfectly equitable?
CN110705820A (en) Scientific and technological innovation capability diagnosis report generation method and device, storage medium and terminal
Lei et al. Prediction of Alzheimer's Disease Based on Random Forest Model
Bagewadi et al. Reliability of gender determination from paranasal sinuses and its application in forensic identification-a systematic review and meta-analysis
Lin et al. A case-finding clinical decision support system to identify subjects with chronic obstructive pulmonary disease based on public health data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant