CN110502615B - Health information data element standard data generation method and system - Google Patents

Health information data element standard data generation method and system Download PDF

Info

Publication number
CN110502615B
CN110502615B CN201910801606.8A CN201910801606A CN110502615B CN 110502615 B CN110502615 B CN 110502615B CN 201910801606 A CN201910801606 A CN 201910801606A CN 110502615 B CN110502615 B CN 110502615B
Authority
CN
China
Prior art keywords
data
health information
standard
elements
data element
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910801606.8A
Other languages
Chinese (zh)
Other versions
CN110502615A (en
Inventor
孙海霞
沈柳
李姣
邓盼盼
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Medical Information CAMS
Original Assignee
Institute of Medical Information CAMS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Medical Information CAMS filed Critical Institute of Medical Information CAMS
Priority to CN201910801606.8A priority Critical patent/CN110502615B/en
Publication of CN110502615A publication Critical patent/CN110502615A/en
Application granted granted Critical
Publication of CN110502615B publication Critical patent/CN110502615B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/38Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/383Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content

Abstract

The invention discloses a method and a system for generating health information data element standard data, which are used for acquiring a health information data element of a target document; analyzing the health information data element to obtain a basic element of the health information data element; the health information data elements and the basic elements are respectively matched with the health information standard to obtain target data, so that the target data comprises the corresponding relation between the health information data elements and the basic elements and the health information standard, the granularity of the health information data elements can be analyzed, the fine granularity level of the health information standard analysis is improved due to the fact that the data elements are the minimum granularity of the health information data, the health information data elements and the basic elements in the target document can be respectively associated with the health information standard, and the search time of a user between different health information standards is saved.

Description

Health information data meta-standard data generation method and system
Technical Field
The invention relates to the technical field of data processing, in particular to a health information data meta-standard data generation method and system.
Background
The existing national health information data standard is usually stored in a PDF document form, and comprises a national platform and a database business platform, the expression of the national health information data standard is limited to a standard number, a name, release time, implementation time and the like, and the content disclosure granularity is coarse. That is, the displaying, querying and sharing of the current health information standard are limited to the PDF file and the basic information thereof, and the user usually learns and applies the health information data elements and other more detailed information by reading the PDF file, and does not have a data organization method and application practice for the health information standard with finer granularity, so that the user can search for different health information standards for a longer time.
Disclosure of Invention
In order to solve the problems, the invention provides a method and a system for generating health information data meta-standard data, which realize the improvement of fine-grained level of health information standard analysis and save the search time of a user between different health information standards.
In order to achieve the purpose, the invention provides the following technical scheme:
a health information data meta-standard data generating method, comprising:
acquiring a health information data element of a target document;
analyzing the health information data element to obtain a basic element of the health information data element;
and respectively matching the health information data elements and the basic elements with the health information standard to obtain target data, wherein the target data comprises corresponding relations between the health information data elements and the basic elements with the health information standard.
Optionally, the obtaining the health information data element of the target document includes:
and extracting data of the target document to obtain a health information data element.
Optionally, the parsing the health information data element to obtain a basic element of the health information data element includes:
analyzing the health information data element according to the data type contained in the health information data element to obtain an analysis result;
and identifying the analysis result to obtain the basic elements of the health information data elements.
Optionally, the matching the health information data element and the basic element with the health information standard, respectively, to obtain target data, includes:
and matching the health information data element and the basic element with a health information standard through a reference data model to obtain target data, wherein the reference data model comprises the corresponding relation between the health information data element and the basic element with the health information standard.
Optionally, the method further comprises:
and encoding the target data and storing the encoded target data.
A health information data meta-standard data generating system comprising:
an acquisition unit for acquiring a hygiene information data element of a target document;
the analysis unit is used for analyzing the health information data element to obtain a basic element of the health information data element;
and the matching unit is used for respectively matching the health information data elements and the basic elements with the health information standard to obtain target data, and the target data comprises the corresponding relations between the health information data elements and the basic elements with the health information standard.
Optionally, the obtaining unit includes:
and the extraction subunit is used for performing data extraction on the target document to obtain the health information data element.
Optionally, the parsing unit includes:
the analysis subunit is used for analyzing the health information data element according to the data type contained in the health information data element to obtain an analysis result;
and the identification subunit is used for identifying the analysis result to obtain the basic elements of the health information data element.
Optionally, the matching unit is specifically configured to:
and matching the health information data element and the basic element with a health information standard through a reference data model to obtain target data, wherein the reference data model comprises the corresponding relation between the health information data element and the basic element with the health information standard.
Optionally, the system further comprises:
and the encoding unit is used for encoding the target data and storing the encoded target data.
Compared with the prior art, the invention provides a health information data element standard data generation method and system, which are used for acquiring the health information data element of a target document; analyzing the health information data element to obtain a basic element of the health information data element; the health information data elements and the basic elements are respectively matched with the health information standard to obtain target data, so that the target data comprises the corresponding relation between the health information data elements and the basic elements and the health information standard, the granularity of the health information data elements can be analyzed, the fine granularity level of the health information standard analysis is improved due to the fact that the data elements are the minimum granularity of the health information data, the health information data elements and the basic elements in the target document can be respectively associated with the health information standard, and the search time of a user between different health information standards is saved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
Fig. 1 is a schematic flow chart of a method for generating health information data meta standard data according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a health information standard knowledge chain model provided by an embodiment of the invention;
FIG. 3 is a schematic diagram of a health information standard and organization association method according to an embodiment of the invention;
fig. 4 is a schematic structural diagram of a health information data meta-standard data generating system according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The terms "first" and "second," and the like in the description and claims of the present invention and the above-described drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "comprising" and "having," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not set forth for a listed step or element but may include steps or elements not listed.
In an embodiment of the present invention, a health information data metadata generation method is provided, and referring to fig. 1, the method includes:
s101, acquiring a health information data element of a target document;
s102, analyzing the health information data element to obtain a basic element of the health information data element;
the health information standard is a standard specification for data acquisition, storage and sharing in the health field issued by the ministry of health. A data element refers to a data element whose definition, identification, and permitted values are specified by a set of attributes, and the data element itself contains a wide variety of data, including identifiers, definitions, names, value ranges, and the like. The health information data element is a data element in the specific field of health and has the characteristics of uniqueness, simplicity and accuracy.
In the embodiment of the invention, when the health information data element of the target document is acquired, data extraction needs to be performed on the target document to acquire the health information data element, wherein the target document is a document which is normalized according to the content or format of an electronic medical record and an electronic health record, and the normalized range in the target document mainly comprises the specification of dimensions such as a range, a normalized reference file, terms, document content, document structure specification and examples.
After the health information data element is obtained, the health information data element is also required to be analyzed to obtain a basic element, and the health information data element can be analyzed according to the data type contained in the health information data element to obtain an analysis result; and identifying the analysis result to obtain the basic elements of the health information data elements.
In the embodiment of the invention, the health information data elements in the document are analyzed, and the basic elements of the data elements are extracted: data element identifier, name, definition, data type of the data origin value, representation format and data element allowed value. The health information data elements can be analyzed according to the determined extraction length and the determined extraction rule, the analysis result can represent the element types possibly contained in the health information data elements, the analysis result is further identified, and the basic elements of the health information data elements can be obtained.
Specifically, the extraction of the health information data elements and the basic elements can be completed through a data extraction model, that is, the health information data elements and the basic elements are used as training samples to train to obtain the data extraction model.
S103, respectively matching the health information data element, the basic element and the health information standard to obtain target data;
the target data includes health information data elements and correspondence of the basic elements to the health information standards.
And matching the health information data element and the basic element with a health information standard through a reference data model to obtain target data, wherein the reference data model comprises the corresponding relation between the health information data element and the basic element with the health information standard.
In order to realize that the corresponding relation between the health information data elements and the health information standard can be obtained by analyzing the health information data elements with the minimum granularity, the reference data model can be trained, and then the matching of the relation can be completed by utilizing the reference data model.
The health information standard is a standard for data acquisition, storage and sharing in the health field, comprises document specifications, basic data sets and data element machine value domain codes of data exchange in each stage in the health management process, and restricts data representation in the data sharing process. The document specification is the specification of the electronic medical record/electronic health record shared document, and comprises the following steps: scope definitions for the specification, standard reference example tables, document content and structure specifications, examples, and the like. The specification of the document content and structure refers to data elements or other encoding specifications in other standards of health information. The hygiene data metadirectory includes a number of data values and their corresponding constraints that are required by the document specification. The hygiene data metadirectory includes a large number of data values and their corresponding constraints, such as data types, data formats, etc., that are required in the document specification. Meanwhile, the data values in the health data meta-directory sometimes refer to the value description of the specific data in the health data meta-value domain code or the value description of other encoding specifications. The health information standard knowledge linking model is shown in fig. 2, and defines three relations: "contain", "reference data model", "reference data value", reveal the internal association of these health information standards, form the associative link with data element as the center.
The health record/electronic medical record document specification can include a range of used files, terms, document contents, document structure specification, examples and the like. The health data element catalog comprises a data element A and a data element B, wherein each data element comprises basic elements such as a data element identifier, a data element name, a definition, a data type and the like, and the health data element value field code comprises a coding set.
Based on the model, the existing data is processed to form an interface supporting browsing and query. The main operations of data processing include: 1) analyzing the health information resource and extracting the data element; 2) entity encoding; 3) entity alignment and association.
The embodiment of the application aims at specific words and formats and extracts data in the health information standard through a strategy of combining OCR (optical Character recognition) recognition and manual verification.
The health information standard basic information identification comprises the following steps: standard number, standard name, release time and the like of the health information standard.
Identification of data elements, as shown in the health information standard knowledge link model diagram of fig. 2, the items in the data elements are stored as a whole when they occur. In addition, the data element allowable value sometimes refers to a value range code in other health information standards, the part of information needs to extract a standard name, a standard number and a value range code name and a code number where the data element value range code is located, and then a 'reference data value' relationship can be established by means of the part of information. At the same time, information of the hygiene information standard in which it is located is recorded, and then the "containing" relationship can be established by means of this part of the information.
Identification of data element value field codes, the common expression form of the data in the part is a form of a table, therefore, if a table head in the health information standard appears 'CV … code table', the data element value field codes are identified, and the structure, the content, the Chinese name of the table head (value field code name), the code of the table head (code number) and all the information of the health information standard of the table are identified and extracted.
Identification of other information, including the content of the part in the document specification of fig. 2, refers to the part data element information or other coding system information, and therefore, information of the hygiene information standard, the name of the related data element, the coding information and other coding system information of the part need to be stored. The relationship of the "reference data model" can then be established by virtue of this piece of information.
After data extraction, entity coding is mainly to code unique identifiers for each document (standard), data element and code table, so that warehousing management is facilitated.
Because there are two types of reference relations between the standards, including "reference data value" and "reference data model", according to the above data extraction, the information in the data matches the reference standard, data element, and code table name, and if the character string matches completely, the related association is established in the system. Complete matching requires consideration of ignoring space and letter case to improve recall. Fig. 3 includes examples of various types of health information standards, data elements, other coding specifications, and code tables, and the organization method described above is specifically described for the relevant information of "follow-up".
After the association is established, a browsing and querying system capable of supporting the data elements and the elements (names, identifications, definitions, types, values and the like) thereof is formed, and meanwhile, the reference relation of each standard and the data elements can be displayed.
As shown in fig. 3, the follow-up mode, the operation and operation code, and the gender code may be respectively in one-to-one correspondence with the related information in the hygiene data standard, i.e., which hygiene data standard the code mode is derived from may be determined by the code mode of the follow-up mode. The hygiene information criteria referred to in fig. 3 include:
WS 445.5 electronic medical record elementary data set section 5: general treatment record;
WS 363.12 health information data metadirectory part 12: planning and intervening;
WS 364.12 hygiene information data element value field code part 12: planning and intervening;
GB/T2261.1-2003 personal basic information Classification and code first part: a gender code of the person.
In the embodiment of the application, the fine-grained information elements in the health information standard are analyzed and disclosed, so that a user can conveniently and visually browse and retrieve the fine-grained information element information in the national health information standard, such as specific data elements, data element value domains and the like. Semantic association among different types of health information standards is established, the reference relation and the path among elements such as data elements are visually displayed, and the time for a user to jump among different standards and search is saved.
In an embodiment of the present application, a health information data meta-standard data generating system, see fig. 4, includes:
an acquisition unit 10 for acquiring a hygiene information data element of a target document;
the analysis unit 20 is used for analyzing the health information data element to obtain a basic element of the health information data element;
a matching unit 30, configured to match the health information data elements and the basic elements with the health information standards, respectively, to obtain target data, where the target data includes corresponding relationships between the health information data elements and the basic elements with the health information standards.
On the basis of the above embodiment, the acquiring unit includes:
and the extraction subunit is used for performing data extraction on the target document to obtain the health information data element.
On the basis of the above embodiment, the parsing unit includes:
the analysis subunit is used for analyzing the health information data element according to the data type contained in the health information data element to obtain an analysis result;
and the identification subunit is used for identifying the analysis result to obtain the basic elements of the health information data element.
On the basis of the foregoing embodiment, the matching unit is specifically configured to:
and matching the health information data element and the basic element with a health information standard through a reference data model to obtain target data, wherein the reference data model comprises the corresponding relation between the health information data element and the basic element with the health information standard.
On the basis of the above embodiment, the system further includes:
and the encoding unit is used for encoding the target data and storing the encoded target data.
The invention provides a health information data element standard data generation system.A acquisition unit acquires a health information data element of a target document; the analysis unit analyzes the health information data element to obtain a basic element of the health information data element; the matching unit is used for respectively matching the health information data element, the basic element and the health information standard to obtain target data, so that the target data comprises the corresponding relation between the health information data element and the basic element and the health information standard, the granularity of the health information data element can be analyzed, the granularity level of the health information standard is improved due to the fact that the data element is the minimum granularity of the health information data, the health information data element and the basic element in the target document can be respectively associated with the health information standard, and the searching time of a user among different health information standards is saved.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (6)

1. A health information data meta-standard data generating method, comprising:
acquiring a health information data element of a target document;
analyzing the health information data element to obtain the basic elements of the health information data element, wherein the method comprises the following steps: extracting different types of data or elements contained in the health information data elements; determining different extraction lengths and extraction rules according to different types; analyzing the health information data element according to the determined extraction length and the extraction rule to obtain an analysis result; identifying the analysis result to obtain the basic elements of the health information data elements; the basic elements of the data element include: data element identifier, name, definition, data type of data primitive, representation format and data element allowed value;
respectively matching the health information data elements and the basic elements with a health information standard to obtain target data, wherein the target data comprises the corresponding relation between the health information data elements and the basic elements with the health information standard;
establishing a correspondence between the health information data elements and the basic elements and the health information standard, including: establishing three relations of an inclusion model, a reference model and a reference data value to form relation connection with a data element as a center; the reference data value relationship is established according to the standard name and the standard number of the extracted data element value domain code, and the value domain code name and the code number; the inclusion relation is established according to the recorded information of the sanitary information standard; the reference data model relation is established according to the information of part of health information standards in the stored document specification, the names of related data elements, coding information and the information of other coding systems; and performing character matching on the quoted labels, the data elements and the code table names according to the reference data value relationship and the information extracted from the reference data model relationship, and establishing related association in the system when the character strings are completely matched.
2. The method of claim 1, wherein obtaining the health information data element of the target document comprises:
and extracting data of the target document to obtain a health information data element.
3. The method of claim 1, further comprising:
and encoding the target data and storing the encoded target data.
4. A health information data meta-standard data generating system, comprising:
an acquisition unit for acquiring a hygiene information data element of a target document;
the system is used for extracting different types of data or elements contained in the health information data element;
the system is also used for determining different extraction lengths and extraction rules according to different types;
the analysis subunit is used for analyzing the health information data element according to the determined extraction length and the extraction rule to obtain an analysis result;
the identification subunit is used for identifying the analysis result to obtain the basic elements of the health information data element; the basic elements of the data elements include: data element identifier, name, definition, data type of data origin, representation format and data element allowed value;
a matching unit, configured to match the hygiene information data element and the basic element with a hygiene information standard, respectively, to obtain target data, where the target data includes a correspondence between the hygiene information data element and the basic element with the hygiene information standard;
establishing a correspondence between the health information data elements and the basic elements and the health information standard, including: establishing three relations of an inclusion model, a reference model and a reference data value to form relation connection with a data element as a center; the reference data value relation is established according to the standard name and the standard number of the extracted data element value domain code, and the value domain code name and the code number; the inclusion relation is established according to the recorded information of the sanitary information standard; the reference data model relation is established according to the information of part of health information standards in the stored document specification, the names of related data elements, coding information and the information of other coding systems; and performing character matching on the quoted labels, the data elements and the code table names according to the reference data value relationship and the information extracted from the reference data model relationship, and establishing related association in the system when the character strings are completely matched.
5. The system of claim 4, wherein the obtaining unit comprises:
and the extraction subunit is used for performing data extraction on the target document to obtain the health information data element.
6. The system of claim 4, further comprising:
and the coding unit is used for coding the target data and storing the coded target data.
CN201910801606.8A 2019-08-28 2019-08-28 Health information data element standard data generation method and system Active CN110502615B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910801606.8A CN110502615B (en) 2019-08-28 2019-08-28 Health information data element standard data generation method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910801606.8A CN110502615B (en) 2019-08-28 2019-08-28 Health information data element standard data generation method and system

Publications (2)

Publication Number Publication Date
CN110502615A CN110502615A (en) 2019-11-26
CN110502615B true CN110502615B (en) 2022-07-05

Family

ID=68590075

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910801606.8A Active CN110502615B (en) 2019-08-28 2019-08-28 Health information data element standard data generation method and system

Country Status (1)

Country Link
CN (1) CN110502615B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114388142B (en) * 2022-03-23 2022-06-21 成都瑞华康源科技有限公司 Value domain code mapping rapid processing method

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104461907A (en) * 2014-12-30 2015-03-25 成都金盘电子科大多媒体技术有限公司 Health information data set standard conformance automated testing method and system
CN106462535A (en) * 2014-01-14 2017-02-22 口袋医生公司 System and method for dynamic transactional data streaming

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7523505B2 (en) * 2002-08-16 2009-04-21 Hx Technologies, Inc. Methods and systems for managing distributed digital medical data
US20050251533A1 (en) * 2004-03-16 2005-11-10 Ascential Software Corporation Migrating data integration processes through use of externalized metadata representations
US20070156737A1 (en) * 2005-12-15 2007-07-05 First Data Corporation Application integration systems and methods
US9087080B2 (en) * 2009-10-14 2015-07-21 Trice Imaging, Inc. Systems and methods for converting and delivering medical images to mobile devices and remote communications systems

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106462535A (en) * 2014-01-14 2017-02-22 口袋医生公司 System and method for dynamic transactional data streaming
CN104461907A (en) * 2014-12-30 2015-03-25 成都金盘电子科大多媒体技术有限公司 Health information data set standard conformance automated testing method and system

Also Published As

Publication number Publication date
CN110502615A (en) 2019-11-26

Similar Documents

Publication Publication Date Title
US8161059B2 (en) Method and apparatus for collecting entity aliases
US10423649B2 (en) Natural question generation from query data using natural language processing system
CN109145110B (en) Label query method and device
CN111144723A (en) Method and system for recommending people's job matching and storage medium
US9025890B2 (en) Information classification device, information classification method, and information classification program
CN106649778B (en) Interaction method and device based on deep question answering
US10417267B2 (en) Information processing terminal and method, and information management apparatus and method
CN111209411B (en) Document analysis method and device
US20030028503A1 (en) Method and apparatus for automatically extracting metadata from electronic documents using spatial rules
CN103150356B (en) A kind of the general demand search method and system of application
Gottron Evaluating content extraction on HTML documents
JP7402965B2 (en) Image database construction method, search method, electronic equipment and storage medium
CN106874397B (en) Automatic semantic annotation method for Internet of things equipment
CN109191158A (en) The processing method and processing equipment of user's portrait label data
CN105653547A (en) Method and device for extracting keywords of text
CN108170708B (en) Vehicle entity identification method, electronic equipment, storage medium and system
CN110502615B (en) Health information data element standard data generation method and system
Schmidt et al. Extraction of address data from unstructured text using free knowledge resources
CN111259645A (en) Referee document structuring method and device
CN112307318A (en) Content publishing method, system and device
JP6409071B2 (en) Sentence sorting method and calculator
CN112015907A (en) Method and device for quickly constructing discipline knowledge graph and storage medium
US20090182759A1 (en) Extracting entities from a web page
KR100893629B1 (en) The system and method for granting the sentence structure of electronic teaching materials contents identification codes, the system and method for searching the data of electronic teaching materials contents, the system and method for managing points about the use and service of electronic teaching materials contents
CN108228609B (en) Information filtering method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant