CN112786132A

CN112786132A - Medical record text data segmentation method and device, readable storage medium and electronic equipment

Info

Publication number: CN112786132A
Application number: CN202011633275.0A
Authority: CN
Inventors: 张蒙
Original assignee: Beijing Yiyiyun Technology Co ltd
Current assignee: Beijing Yiyiyun Technology Co ltd
Priority date: 2020-12-31
Filing date: 2020-12-31
Publication date: 2021-05-11
Anticipated expiration: 2040-12-31
Also published as: CN112786132B

Abstract

The disclosure relates to the technical field of data processing, and provides a medical record text data segmentation method, a medical record text data segmentation device, a readable storage medium and electronic equipment, wherein the medical record text data segmentation method comprises the following steps: acquiring medical record text data in an electronic medical record system, wherein the medical record text data is of an unstructured data type or a semi-structured data type; determining a target segmentation strategy model in a plurality of segmentation strategy models according to the data structure type of the medical record text data, wherein the segmentation strategy model and the data structure type of the medical record text data have a mapping relation; and configuring segmentation points in the medical record text data through the target segmentation strategy model, and performing data segmentation on the medical record text data according to the segmentation points. According to the medical record text data segmentation method and device, the medical record text data can be segmented through the segmentation strategy model, and the utilization rate of the medical record text data is improved.

Description

Medical record text data segmentation method and device, readable storage medium and electronic equipment

Technical Field

The present disclosure relates to the field of data processing technologies, and in particular, to a medical record text data segmentation method, a medical record text data segmentation apparatus, a computer-readable storage medium, and an electronic device.

Background

With the rapid development of information technology, various aspects of people such as clothes, eating and housing are greatly improved. For example, in the medical field, more and more medical institutions use the electronic medical record system to store and manage medical data of patients, so that the confidentiality of patient information is effectively improved, and the electronic medical record system can be used for carrying out safe and effective information sharing with other medical institutions.

However, in the existing electronic medical record system, medical data is stored in a mixed manner, so that the source of the data cannot be effectively located, and the subsequent utilization rate of the medical data is low.

In view of this, there is a need in the art to develop a new method and apparatus for segmenting medical record text data.

It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present disclosure, and thus may include information that does not constitute prior art known to those of ordinary skill in the art.

Disclosure of Invention

The present disclosure is directed to a medical record text data segmentation method, a medical record text data segmentation apparatus, a computer-readable storage medium, and an electronic device, so as to improve the utilization rate of data at least to a certain extent.

Additional features and advantages of the disclosure will be set forth in the detailed description which follows, or in part will be obvious from the description, or may be learned by practice of the disclosure.

According to one aspect of the disclosure, a medical record text data segmentation method is provided, and the medical record text data segmentation method includes: acquiring medical record text data in an electronic medical record system, wherein the medical record text data is of an unstructured data type or a semi-structured data type; determining a target segmentation strategy model in a plurality of segmentation strategy models according to the data structure type of the medical record text data, wherein the segmentation strategy model and the data structure type of the medical record text data have a mapping relation; and configuring segmentation points in the medical record text data through the target segmentation strategy model, and performing data segmentation on the medical record text data according to the segmentation points.

According to one aspect of the disclosure, a medical record text data segmentation method is provided, and the medical record text data segmentation method includes: acquiring medical record text data in an electronic medical record system, and acquiring a plurality of sample text data with different data structure types from the medical record text data, wherein the medical record text data is of an unstructured data type or a semi-structured data type; correcting the initial segmentation strategy model according to the sample text data respectively to obtain a plurality of segmentation strategy models; and determining a target segmentation strategy model in the plurality of segmentation strategy models according to the data structure type of the medical record text data, configuring segmentation points in the medical record text data through the target segmentation strategy model, and performing data segmentation on the medical record text data according to the segmentation points.

In an exemplary embodiment of the present disclosure, determining a target segmentation policy model among a plurality of segmentation policy models according to a data structure type of the medical record text data includes: and determining a target segmentation strategy model corresponding to the data structure type of the medical record text data through the mapping relation, wherein the data structure type of the medical record text data is associated with the system type of the electronic medical record system.

In an exemplary embodiment of the present disclosure, the target segmentation policy model includes a plurality of matchers among a keyword matcher, a regular expression matcher, a node attribute matcher, and a date matcher; configuring segmentation points in the medical record text data through the target segmentation strategy model, wherein the configuration comprises the following steps: and respectively matching each matcher with the medical record text data, determining a plurality of matching points corresponding to each matcher, and configuring the matching points as the segmentation points.

In an exemplary embodiment of the present disclosure, the target segmentation policy model includes one of a keyword matcher, a regular expression matcher, a node attribute matcher, and a date matcher; configuring segmentation points in the medical record text data through the target segmentation strategy model, wherein the configuration comprises the following steps: and matching the matcher with the medical record text data, determining one or more matching points corresponding to the matcher, and configuring the one or more matching points as the segmentation points.

In one exemplary embodiment of the present disclosure, the matcher includes a keyword matcher including one or more keywords; matching the matcher with the medical record text data, and determining one or more matching points corresponding to the matcher, wherein the matching points comprise: and matching each keyword with the medical record text data through the keyword matcher to obtain one or more matching points corresponding to each keyword.

In one exemplary embodiment of the present disclosure, the matcher includes a regular expression matcher including one or more regular expressions; matching the matcher with the medical record text data, and determining one or more matching points corresponding to the matcher, wherein the matching points comprise: and respectively matching each regular expression with the medical record text data through the regular expression matcher to obtain one or more matching points corresponding to each regular expression.

In an exemplary embodiment of the present disclosure, the matcher is the node attribute matcher, which includes one or more preset node attributes; matching the matcher with the medical record text data, and determining one or more matching points corresponding to the matcher, wherein the matching points comprise: and respectively matching each preset node attribute with the node attribute corresponding to the medical record text data through the node attribute matcher to obtain one or more matching points corresponding to each preset node attribute.

In an exemplary embodiment of the disclosure, after acquiring medical record text data in an electronic medical record system, the method further comprises: judging whether the medical record text data is of an unstructured data type, wherein the unstructured data comprises a plain text file data type or an html data type; and if the medical record text data is of the unstructured data type, converting the medical record text data into the semi-structured data type.

According to an aspect of the present disclosure, there is provided a medical record text data dividing apparatus including: the data acquisition module is used for acquiring medical record text data in an electronic medical record system, wherein the medical record text data is of an unstructured data type or a semi-structured data type; the model determining module is used for determining a target segmentation strategy model from a plurality of segmentation strategy models according to the data structure type of the medical record text data, wherein the segmentation strategy model and the data structure type of the medical record text data have a mapping relation; and the data segmentation module is used for configuring segmentation points in the medical record text data through the target segmentation strategy model and carrying out data segmentation on the medical record text data according to the segmentation points.

According to an aspect of the present disclosure, there is provided a medical record text data segmentation apparatus, the medical record text data segmentation system including: the system comprises a sample data acquisition module, a data processing module and a data processing module, wherein the sample data acquisition module is used for acquiring medical record text data in an electronic medical record system and acquiring a plurality of sample text data with different data structure types from the medical record text data, and the medical record text data is of an unstructured data type or a semi-structured data type; the model configuration module is used for acquiring an initial segmentation strategy model and modifying the initial segmentation strategy model according to the sample text data to obtain a plurality of segmentation strategy models; and the text data segmentation module is used for determining a target segmentation strategy model in the segmentation strategy models according to the data structure type of the medical record text data, configuring segmentation points in the medical record text data through the target segmentation strategy model, and performing data segmentation on the medical record text data according to the segmentation points.

According to an aspect of the present disclosure, there is provided a computer readable medium, on which a computer program is stored, which when executed by a processor, implements the medical record text data segmentation method as described in the above embodiments.

According to an aspect of the present disclosure, there is provided an electronic device including: one or more processors; a storage device for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the medical record text data segmentation method as described in the above embodiments.

According to the technical scheme, the medical record text data segmentation method and device, the computer-readable storage medium and the electronic device in the exemplary embodiment of the disclosure have at least the following advantages and positive effects:

the medical record text data segmentation method of the embodiment of the disclosure comprises the steps of firstly obtaining medical record text data of an electronic medical record system, wherein the medical record text data is unstructured data or semi-structured data; determining a target segmentation strategy model in a plurality of segmentation strategy models according to the data structure type of the medical record text data, wherein the segmentation strategy model and the data structure type of the medical record text data have a mapping relation; and finally, configuring segmentation points in the medical record text data through the target segmentation strategy model, and performing data segmentation on the medical record text data according to the segmentation points. On one hand, the medical record text data segmentation method disclosed by the disclosure segments the medical record text data through the target segmentation strategy model corresponding to the data structure type of the medical record text data, so that the pertinence of data segmentation is improved, and the data segmentation is more accurate; on the other hand, the medical record text data in the electronic medical record system is effectively segmented, so that the analysis and the utilization of the medical record text data are facilitated, and the utilization rate of the medical record text data is improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure. It is to be understood that the drawings in the following description are merely exemplary of the disclosure, and that other drawings may be derived from those drawings by one of ordinary skill in the art without the exercise of inventive faculty.

Fig. 1 schematically illustrates a flow diagram of a medical record text data segmentation method according to an embodiment of the present disclosure;

FIG. 2 schematically illustrates a method flow diagram for data type conversion according to an embodiment of the present disclosure;

FIG. 3 schematically illustrates a flow chart of a medical record text data segmentation method according to a specific embodiment of the present disclosure;

FIG. 4 schematically illustrates a flow diagram of a medical record text data segmentation method according to an embodiment of the present disclosure;

FIG. 5 schematically illustrates a block diagram of a medical record text data segmentation apparatus according to an embodiment of the present disclosure;

FIG. 6 schematically illustrates a block diagram of a medical record text data segmentation apparatus according to an embodiment of the present disclosure;

FIG. 7 schematically shows a block schematic of an electronic device according to an embodiment of the disclosure;

fig. 8 schematically shows a program product schematic according to an embodiment of the present disclosure.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art.

Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the subject matter of the present disclosure can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and so forth. In other instances, well-known methods, devices, implementations, or operations have not been shown or described in detail to avoid obscuring aspects of the disclosure.

The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.

The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.

With the continuous and deep medical information application, the demand for medical information has developed from simple medical data management to deep analysis of medical data. At present, the mainstream EMR information system stores medical data by using unstructured documents, for example, by using a method of mixing a plurality of documents such as a Clinical Document Architecture (CDA), the source of the medical data cannot be effectively located, and Document content fields cannot be accurately extracted and analyzed.

Based on the problems in the related art, an embodiment of the present disclosure provides a medical record text data segmentation method, and fig. 1 shows a flow diagram of the medical record text data segmentation method in the embodiment of the present disclosure, as shown in fig. 1, the medical record text data segmentation method at least includes the following steps:

step S110: acquiring medical record text data in an electronic medical record system, wherein the medical record text data is of an unstructured data type or a semi-structured data type;

step S120: determining a target segmentation strategy model in a plurality of segmentation strategy models according to the data structure type of the medical record text data, wherein the segmentation strategy model and the data structure type of the medical record text data have a mapping relation;

step S130: and configuring segmentation points in the medical record text data through the target segmentation strategy model, and performing data segmentation on the medical record text data according to the segmentation points.

According to the medical record text data segmentation method in the embodiment of the disclosure, on one hand, medical record text data is segmented through the target segmentation strategy model corresponding to the data structure type of the medical record text data, so that the pertinence of data segmentation is improved, and the data segmentation is more accurate; on the other hand, the medical record text data in the electronic medical record system is effectively segmented, so that the analysis and the utilization of the medical record text data are facilitated, and the utilization rate of the medical record text data is improved.

It should be noted that the medical record text data segmentation method according to the exemplary embodiment of the present disclosure may be executed by a server, and a medical record text data segmentation apparatus corresponding to the medical record text data segmentation method may also be configured in the server. In addition, it should be understood that a terminal device (e.g., a mobile phone, a tablet, etc.) may also implement the steps of the medical record text data segmentation method, and corresponding devices may also be configured in the terminal device.

In order to make the technical scheme of the present disclosure clearer, the following will describe each step of the medical record text data segmentation method by taking the medical record text data segmentation method applied to the field of medical data processing as an example.

In step S110, medical record text data in the electronic medical record system is obtained, where the medical record text data is of an unstructured data type or a semi-structured data type.

In an exemplary embodiment of the present disclosure, the electronic medical record system, also referred to as a computerized medical record system or computer-based patient record, is a digitized patient medical record that is stored, managed, transmitted, and reproduced electronically, replacing the traditional handwritten paper case. Such as an EMR electronic medical record system, an XYEMR electronic medical record system, etc.

The medical record text data in the electronic medical record system comprises all information of the handwritten paper case, for example, the medical record text data comprises identification information of a medical institution, identification information of a patient, a record of admission, a record of discharge, a record of operation, a record of medication, a record of an attending physician, a record of nursing staff, a record of examination and inspection, an examination and inspection result, a medical order and a record of course of disease. The identification information of the medical institution comprises a medical institution code, a medical institution grade and the like, and the identification information of the patient comprises a patient identity card number, sex, age, family address, contact information and the like.

In an exemplary embodiment of the present disclosure, the medical record text data is an unstructured data type or a semi-structured data type. Where structured data types consist of well-defined data whose schema can make it easy to search, unstructured data types typically consist of data that is not easy to search, unstructured data types have an internal structure, but are not structured through a predefined data model or schema. The unstructured data types may be textual or non-textual, and may be human or machine generated.

By way of example, artificial unstructured data types may include: text files, spreadsheets, presentations, emails, logs; the machine-generated unstructured data may include: weather data, atmospheric data, surveillance photos and videos, and the like. Users can run simple content searches through textual unstructured data types, however, unstructured data types lack an ordered internal structure, making parsing and analysis of unstructured data types difficult.

In addition, the semi-structured data types are used to identify internal tags and labels for individual data elements, thereby enabling information grouping and hierarchy. For example, the extensible markup language xml is a semi-structured document language, which is a set of document encoding rules defining human-machine readable formats; the lightweight data exchange format (JSON, JavaScript Object notification) is another semi-structured data exchange format, and its mechanism is composed of name/value pairs (or objects, hash tables, etc.) and ordered value lists (or arrays, sequences, lists).

In an exemplary embodiment of the present disclosure, a medical facility records patient information using an electronic medical record system that stores the patient information in the form of medical record textual data in a database in the electronic medical record system. Acquiring medical record text data in a database of the electronic medical record system, namely acquiring medical record text data of a medical institution through the electronic medical record system.

In step S120, a target segmentation policy model is determined among the plurality of segmentation policy models according to the data structure type of the medical record text data, wherein the segmentation policy model has a mapping relationship with the data structure type of the medical record text data.

In an exemplary embodiment of the present disclosure, since there are a plurality of suppliers that produce electronic medical record systems, the structure of the electronic medical record system provided to the medical institution by each supplier is different. That is, there are a plurality of system types in the electronic medical record system, and the system types are associated with suppliers, and since the electronic medical record system is continuously updated and upgraded, the system types of the electronic medical record system provided by the same supplier may be different. Therefore, when acquiring the medical record text data of the electronic medical record system, the supplier code, the medical institution code and the system type of the electronic medical record system are acquired at the same time.

In an exemplary embodiment of the present disclosure, the data structure type of the medical record text data includes a data type, a structure type, a storage type, and the like of the medical record text data, and the data structure type of the medical record text data is determined according to the data type, the structure type, or the storage type of the medical record text data. Wherein the data type may comprise an unstructured data type or a semi-structured data type; the structure type may include a linear structure, a tree structure, a graph-like structure, etc.; the storage type may include an index storage manner, a hash storage manner, a sequential storage manner, and the like.

The data structure type of the medical record text data is associated with a data source, and the data source refers to the system type of the electronic medical record system, the supplier code of the electronic medical record system and the medical institution code corresponding to the medical record text data. After the supplier code, the medical institution code and the system type of the electronic medical record system are obtained, the data structure type of the medical record text data can be determined.

In an exemplary embodiment of the disclosure, a target segmentation policy model corresponding to a data structure type of medical record text data is determined through a mapping relationship, wherein the data structure type of the medical record text data is associated with a system type of an electronic medical record system.

Specifically, the mapping relationship between the segmentation policy model and the data structure type of the medical record text data may include: one or more data structure types may correspond to one target segmentation policy model, and one data structure type may correspond to one or more target segmentation policy models.

In addition, since the data structure type of the medical record text data is associated with the data source of the medical record text data, the target segmentation strategy model can also be determined through the data source of the medical record text data. For example, if the node attribute in the medical record text data generated by the electronic medical record system is available, the target segmentation policy model corresponding to the electronic medical record system is a segmentation policy model including a node attribute matcher.

In an exemplary embodiment of the present disclosure, the segmentation policy model may include one or more matchers, for example, the segmentation policy model may include one matcher, may also include two or more matchers, and may be set according to an actual situation, which is not specifically limited by the present disclosure.

In exemplary embodiments of the present disclosure, the matcher may include a keyword matcher, a regular expression matcher, a node attribute matcher, a date matcher, and the like.

The keyword matcher may include one or more keywords, the keyword may be preset medical data, and the preset medical data may be set according to an actual situation, for example, the keyword may be one or more of a admission record, a discharge record, a surgical record, a medication record, an attending physician record, a caregiver record, an examination and test result, a medical order, and a course record, and the number and content of the keywords are not particularly limited in the present disclosure.

Specifically, the keyword matcher may match a node text in the medical record text data with a keyword to obtain a node text matched with the keyword, and mark a matching point on the node text matched with the keyword.

It should be noted that all contents in the page where the medical record text data is located, including the label, the attribute, and the text, may be referred to as nodes, and each node corresponds to a node attribute.

Also, the regular expression matcher may include one or more regular expressions, which may be composed of special characters including "&", "(", ")", "! "and so on element characters, non-special characters include letters, numbers, Chinese characters, and so on, and the disclosure does not make specific restrictions on the specific categories of special characters and non-special characters. For example, "admission records & surgical records & discharge records", etc., the present disclosure does not specifically limit the number and content of regular expressions.

Specifically, the regular expression matcher may match the node text in the medical record text data with the regular expression to obtain a node text matched with the regular expression, and mark matching points on the node text matched with the regular expression.

In addition, the node attribute matcher may include one or more preset node attributes, for example, the preset node attributes may be title nodes, label nodes, paragraph nodes, and the like, and the number and content of the preset node attributes are not specifically limited in this disclosure.

Specifically, the node attribute matcher may match a node attribute in the medical record text data with a preset node attribute to obtain a node attribute matched with the preset node attribute, and mark a matching point for the node attribute matched with the preset node attribute.

Furthermore, the date matcher may include a date feature, for example, the date feature may be "year & month", "year & month & day", and the disclosure does not specifically limit the content of the date feature.

Specifically, the date matcher may match a node text in the medical record text data with a date feature to obtain a node text matched with the date feature, and mark a matching point on the node text matched with the date feature.

The segmentation strategy model in the embodiment of the disclosure can have a plurality of matchers of different types, in the actual data segmentation process, the matchers to be used can be freely selected for collocation processing according to the specific data conditions of different projects, and the different matchers are completely decoupled and can be randomly arranged and combined for use.

In an exemplary embodiment of the present disclosure, a dictionary table database is constructed, in which a mapping relationship between a data structure type of the medical record text data and a segmentation policy model, one or more keywords corresponding to a keyword matcher, one or more regular expressions corresponding to a regular expression matcher, one or more preset node attributes corresponding to a node attribute matcher, date characteristics corresponding to a date matcher, and the like are stored. The dictionary table database may be updated according to the requirements of the national health and wellness committee or the specifications of the medical technology field, or according to historical empirical precipitation.

In addition, the data segmentation of the medical record text data can be completed by acquiring the mapping relationship between the data structure type of the medical record text data and the segmentation policy model, one or more keywords corresponding to the keyword matcher, one or more regular expressions corresponding to the regular expression matcher, one or more preset node attributes corresponding to the node attribute matcher, date characteristics corresponding to the date matcher and the like from the dictionary table database.

In step S130, the medical record text data is divided into data according to the division points by arranging the division points in the medical record text data by the target division policy model.

In an exemplary embodiment of the present disclosure, one or more matchers may be included in the target segmentation policy model. When the target segmentation policy model corresponds to one matcher, namely when the target segmentation policy model comprises one of a keyword matcher, a regular expression matcher, a node attribute matcher and a date matcher, matching the matcher with the medical record text data, determining one or more matching points corresponding to the matcher, and configuring the one or more matching points as segmentation points.

Specifically, matching each keyword with medical record text data through a keyword matcher to obtain one or more matching points corresponding to each keyword; or respectively matching each regular expression with the medical record text data through a regular expression matcher to obtain one or more matching points corresponding to each regular expression; or respectively matching each preset node attribute with the node attribute corresponding to the medical record text data through the node attribute matcher to obtain one or more matching points corresponding to each preset node attribute.

In addition, the date characteristics and the medical record text data can be respectively matched through a date matcher, and one or more matching points corresponding to the date characteristics can be obtained.

In an exemplary embodiment of the present disclosure, when the target segmentation policy model includes a plurality of keyword matchers, regular expression matchers, node attribute matchers, and date matchers, each matcher is respectively matched with the medical record text data, a plurality of matching points corresponding to each matcher are determined, and the plurality of matching points are configured as segmentation points.

If the target segmentation policy model includes a plurality of matchers, priorities may be set for the plurality of matchers, for example, the priorities of the plurality of matchers may be set from high to low as: a keyword matcher, a regular expression matcher, a node attribute matcher, a date matcher and the like. If the target segmentation strategy model comprises a keyword matcher and a node attribute matcher, determining one or more keyword matching points in the medical record text data by using the keyword matcher, determining one or more node attribute matching points in the medical record text data according to the node attribute matcher, and configuring each keyword matching point and each node attribute matching point as segmentation points. Of course, the priorities of the plurality of matchers may be set according to actual conditions, and the number and types of matchers in each target segmentation strategy model may also be selected according to actual conditions, which is not specifically limited by this disclosure.

In addition, in each segmentation strategy model, after the matching points are obtained through one or more matchers, key nodes included in each matching point in the medical record text data are filtered one by one, and the judgment basis for filtering the key nodes is as follows: and configuring the filtered matching points as the segmentation points according to rules of whether different key nodes can be adjacent, whether the context content of the key nodes should appear for a plurality of times, whether the values of the key nodes contain abnormal data and the like.

In the embodiment of the present disclosure, multiple matchers are configured according to different medical record text data, and for medical record text data of a semi-structured data type and an unstructured data type, one or more of the matchers may be selected to perform data segmentation processing on the medical record text data, and a plurality of matchers may be used at the same time to set corresponding priorities. For the case that the medical record text data has a plurality of data structure types, a plurality of matchers can be selected to perform data segmentation processing on the medical record text data.

In an exemplary embodiment of the present disclosure, since the data types of the medical record text data acquired in the electronic medical record system may include: html data type, plain text file data type, irregular xml data type, and the like, and therefore, data format conversion and cleaning need to be uniformly performed on the acquired medical record text data. For example, the html data type and the plain text file data type are subjected to data conversion into the xml data type, the irregular xml data type is formatted, dirty data is cleaned, and the like, so that medical record text data of the xml data type is finally obtained.

Specifically, after medical record text data in an electronic medical record system is acquired, whether the medical record text data is of an unstructured data type is judged, wherein the unstructured data comprises a plain text file data type or an html data type; and if the medical record text data is of an unstructured data type, converting the medical record text data into a semi-structured data type.

Fig. 2 is a schematic flow chart of the data type conversion method of the present disclosure, and as shown in fig. 2, in step S210, the medical record text data is segmented into a plurality of medical record texts according to paragraph marks and medical keywords in the medical record text data; in step S220, performing xml nodularization on the plurality of medical record texts, wherein the xml nodularization is to add node suffixes and nodes to the plurality of medical record texts, and package the medical record texts into xml nodes; in step S230, the plurality of medical record texts after the xml nodalization processing are combined according to the order of the plurality of medical record texts in the medical record text data, so as to obtain the xml data type of the medical record text data.

In addition, the open source code may also be called to convert the html data type into the xml data type, for example, the html data type is converted into the xml data type by using the html agility pack API through the java language, which is not limited in this disclosure.

In an exemplary embodiment of the present disclosure, fig. 3 shows a flowchart of a medical record text data segmentation method in an embodiment of the present disclosure, and as shown in fig. 3, the medical record text data segmentation method at least includes the following steps:

in step S310, medical record text data in the electronic medical record system is obtained, and a plurality of sample text data with different data structure types are obtained from the medical record text data, where the medical record text data is an unstructured data type or a semi-structured data type.

In the exemplary embodiment of the present disclosure, the sample text data may be obtained by using a sampling manner, and respectively taking a preset number of medical record text data from multiple types of electronic medical record systems as the sample text data, that is, the sample text data includes medical record text data of multiple types of electronic medical record systems, that is, the sample text data includes medical record text data of all data structure types. The preset number can be set according to actual conditions, and the number is not particularly limited in the present disclosure.

In step S320, the initial segmentation policy model is modified according to the sample text data to obtain a plurality of segmentation policy models.

In an exemplary embodiment of the present disclosure, the initial segmentation policy model is modified according to various sample text data to obtain a plurality of segmentation policy models. Specifically, the initial segmentation policy model is used to perform data segmentation on each sample text data, and the number of matchers, the types of matchers, the priorities between the matchers, and the like in the segmentation policy model are adjusted according to the data segmentation result, so as to obtain the segmentation policy model corresponding to each sample text data. In addition, the initial segmentation strategy model can be visually displayed, so that a configurator can visually configure the segmentation strategy model and test the segmentation effect of the segmentation strategy model on line.

In step S330, a target segmentation policy model is determined among the plurality of segmentation policy models according to the data structure type of the medical record text data, segmentation points are allocated in the medical record text data through the target segmentation policy model, and data segmentation is performed on the medical record text data according to the segmentation points.

Fig. 4 is a flowchart illustrating a method for data segmentation according to an embodiment of the present disclosure, and as shown in fig. 4, in step S410, medical record text data in an electronic medical record system is obtained, where the medical record text data is of an xml data type; in step S420, matching the medical record text data with a mapping relationship in a dictionary table database, and obtaining a target segmentation policy model corresponding to the data structure type of the medical record text data, where the target segmentation policy model includes a node attribute matcher and a keyword matcher, and the node attribute matcher and priority are higher than the keyword matcher; in step S430, determining matching points in the medical record text data according to one or more preset node attributes in the node attribute matcher, so as to obtain one or more node attribute matching points matched with each preset node attribute; in step S440, determining matching points in the medical record text data according to one or more keywords in the keyword matcher to obtain one or more keyword matching points matching with the keywords; in step S450, configuring one or more node attribute matching points and one or more keyword attribute matching points as segmentation points; in step S460, the medical record text data is subjected to data segmentation according to the segmentation points, wherein the node texts between two adjacent segmentation points can be extracted and combined into new medical record text data, so as to complete the segmentation of the medical record text data.

Those skilled in the art will appreciate that all or part of the steps implementing the above embodiments are implemented as computer programs executed by a CPU. The computer program, when executed by the CPU, performs the functions defined by the method provided by the present invention. The program may be stored in a computer readable storage medium, which may be a read-only memory, a magnetic or optical disk, or the like.

Furthermore, it should be noted that the above-mentioned figures are only schematic illustrations of the processes involved in the method according to exemplary embodiments of the invention, and are not intended to be limiting. It will be readily understood that the processes shown in the above figures are not intended to indicate or limit the chronological order of the processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, e.g., in multiple modules.

The following describes embodiments of an apparatus of the present disclosure, which can be used to perform the above medical record text data segmentation method of the present disclosure. For details that are not disclosed in the embodiments of the apparatus of the present disclosure, please refer to the embodiments of the medical record text data segmentation method of the present disclosure.

Fig. 5 schematically shows a block diagram of a medical record text data segmentation apparatus according to an embodiment of the present disclosure.

Referring to fig. 5, according to a medical record text data splitting apparatus 500 according to an embodiment of the present disclosure, the medical record text data splitting apparatus 500 includes: a data acquisition module 501, a model determination module 502, and a data segmentation module 503. Specifically, the method comprises the following steps:

the data acquisition module 501 is configured to acquire medical record text data in an electronic medical record system, where the medical record text data is an unstructured data type or a semi-structured data type;

a model determining module 502, configured to determine a target segmentation policy model from a plurality of segmentation policy models according to a data structure type of medical record text data, where the segmentation policy model and the data structure type of the medical record text data have a mapping relationship;

and the data segmentation module 503 is configured to configure segmentation points in the medical record text data through the target segmentation policy model, and perform data segmentation on the medical record text data according to the segmentation points.

In an exemplary embodiment of the present disclosure, the model determining module 502 may be further configured to determine, through a mapping relationship, a target segmentation policy model corresponding to a data structure type of medical record text data, where the data structure type of the medical record text data is associated with a system type of an electronic medical record system.

In an exemplary embodiment of the present disclosure, the data segmentation module 503 may be further configured to match each matcher with medical record text data, determine a plurality of matching points corresponding to each matcher, and configure the plurality of matching points as segmentation points, where the target segmentation policy model includes a plurality of matchers among a keyword matcher, a regular expression matcher, a node attribute matcher, and a date matcher.

In an exemplary embodiment of the present disclosure, the data segmentation module 503 may be further configured to match a matcher with the medical record text data, determine one or more matching points corresponding to the matcher, and configure the one or more matching points as segmentation points, where the target segmentation policy model includes one of a keyword matcher, a regular expression matcher, a node attribute matcher, and a date matcher.

In an exemplary embodiment of the present disclosure, the data segmentation module 503 may be further configured to match each keyword with the medical record text data through a keyword matcher, and obtain one or more matching points corresponding to each keyword, where the matcher includes a keyword matcher, and the keyword matcher includes one or more keywords.

In an exemplary embodiment of the present disclosure, the data segmentation module 503 may further be configured to match each regular expression with medical record text data through a regular expression matcher, and obtain one or more matching points corresponding to each regular expression, where the matcher includes the regular expression matcher, and the regular expression matcher includes one or more regular expressions.

In an exemplary embodiment of the present disclosure, the data segmentation module 503 may be further configured to match each preset node attribute with a node attribute corresponding to the medical record text data through a node attribute matcher, and obtain one or more matching points corresponding to each preset node attribute, where the matcher is a node attribute matcher and the node attribute matcher includes one or more preset node attributes.

In an exemplary embodiment of the disclosure, the medical record text data segmentation apparatus 500 further includes a type conversion module (not shown in the figure) for determining whether the medical record text data is of an unstructured data type, where the unstructured data includes a plain text file data type or an html data type; and if the medical record text data is of an unstructured data type, converting the medical record text data into a semi-structured data type.

The specific details of each medical record text data segmentation device are already described in detail in the corresponding medical record text data segmentation method, and therefore, the details are not described herein again.

Fig. 6 schematically shows a block diagram of a medical record text data segmentation apparatus according to an embodiment of the present disclosure.

Referring to fig. 6, according to a medical record text data segmentation apparatus 600 according to an embodiment of the present disclosure, the medical record text data segmentation apparatus 600 includes: a sample data acquisition module 601, a model configuration module 602, and a text data segmentation module 603. Specifically, the method comprises the following steps:

the sample data acquisition module 601 is configured to acquire medical record text data from an electronic medical record system, and acquire a plurality of sample text data with different data structure types from the medical record text data, where the medical record text data is an unstructured data type or a semi-structured data type;

the model configuration module 602 is configured to obtain an initial segmentation policy model, and modify the initial segmentation policy model according to sample text data, respectively, to obtain a plurality of segmentation policy models;

the text data segmentation module 603 is configured to determine a target segmentation policy model among the multiple segmentation policy models according to the data structure type of the medical record text data, configure segmentation points in the medical record text data through the target segmentation policy model, and perform data segmentation on the medical record text data according to the segmentation points.

It should be noted that although in the above detailed description several modules or units of the apparatus for performing are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.

In an exemplary embodiment of the present disclosure, an electronic device capable of implementing the above method is also provided.

As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method or program product. Thus, various aspects of the invention may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.

An electronic device 700 according to this embodiment of the invention is described below with reference to fig. 7. The electronic device 700 shown in fig. 7 is only an example and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.

As shown in fig. 7, electronic device 700 is embodied in the form of a general purpose computing device. The components of the electronic device 700 may include, but are not limited to: the at least one processing unit 710, the at least one memory unit 720, a bus 730 connecting different system components (including the memory unit 720 and the processing unit 710), and a display unit 740.

Wherein the storage unit stores program code that is executable by the processing unit 710 such that the processing unit 710 performs the steps according to various exemplary embodiments of the present invention as described in the above section "exemplary method" of the present specification. For example, the processing unit 710 can execute step S110 shown in fig. 1 to obtain medical record text data in an electronic medical record system, where the medical record text data is of an unstructured data type or a semi-structured data type; step S120, determining a target segmentation strategy model in a plurality of segmentation strategy models according to the data structure type of the medical record text data, wherein the segmentation strategy model and the data structure type of the medical record text data have a mapping relation; and step S130, configuring segmentation points in the medical record text data through the target segmentation strategy model, and performing data segmentation on the medical record text data according to the segmentation points.

The storage unit 720 may include readable media in the form of volatile memory units, such as a random access memory unit (RAM)7201 and/or a cache memory unit 7202, and may further include a read only memory unit (ROM) 7203.

The storage unit 720 may also include a program/utility 7204 having a set (at least one) of program modules 7205, such program modules 7205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.

Bus 730 may be any representation of one or more of several types of bus structures, including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.

The electronic device 700 may also communicate with one or more external devices 900 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a viewer to interact with the electronic device 700, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 700 to communicate with one or more other computing devices. Such communication may occur via an input/output (I/O) interface 750. Also, the electronic device 700 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the internet) via the network adapter 760. As shown, the network adapter 760 communicates with the other modules of the electronic device 700 via the bus 730. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the electronic device 700, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a terminal device, or a network device, etc.) to execute the method according to the embodiments of the present disclosure.

In an exemplary embodiment of the present disclosure, there is also provided a computer-readable storage medium having stored thereon a program product capable of implementing the above-described method of the present specification. In some possible embodiments, aspects of the invention may also be implemented in the form of a program product comprising program code means for causing a terminal device to carry out the steps according to various exemplary embodiments of the invention described in the above section "exemplary methods" of the present description, when said program product is run on the terminal device.

Referring to fig. 8, a program product 800 for implementing the above method according to an embodiment of the present invention is described, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present invention is not limited in this regard and, in the present document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

A computer readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).

Furthermore, the above-described figures are merely schematic illustrations of processes involved in methods according to exemplary embodiments of the invention, and are not intended to be limiting. It will be readily understood that the processes shown in the above figures are not intended to indicate or limit the chronological order of the processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, e.g., in multiple modules.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is to be limited only by the terms of the appended claims.

Claims

1. A medical record text data segmentation method is characterized by comprising the following steps:

acquiring medical record text data in an electronic medical record system, wherein the medical record text data is of an unstructured data type or a semi-structured data type;

determining a target segmentation strategy model in a plurality of segmentation strategy models according to the data structure type of the medical record text data, wherein the segmentation strategy model and the data structure type of the medical record text data have a mapping relation;

and configuring segmentation points in the medical record text data through the target segmentation strategy model, and performing data segmentation on the medical record text data according to the segmentation points.

2. A medical record text data segmentation method is characterized by comprising the following steps:

acquiring medical record text data in an electronic medical record system, and acquiring a plurality of sample text data with different data structure types from the medical record text data, wherein the medical record text data is of an unstructured data type or a semi-structured data type;

correcting the initial segmentation strategy model according to the sample text data respectively to obtain a plurality of segmentation strategy models;

and determining a target segmentation strategy model in the plurality of segmentation strategy models according to the data structure type of the medical record text data, configuring segmentation points in the medical record text data through the target segmentation strategy model, and performing data segmentation on the medical record text data according to the segmentation points.

3. The medical record text data segmentation method according to any one of claims 1 or 2, wherein determining a target segmentation policy model among a plurality of segmentation policy models according to a data structure type of the medical record text data comprises:

and determining a target segmentation strategy model corresponding to the data structure type of the medical record text data through the mapping relation, wherein the data structure type of the medical record text data is associated with the system type of the electronic medical record system.

4. The medical record text data segmentation method according to claim 3, wherein the target segmentation policy model includes a plurality of matchers among a keyword matcher, a regular expression matcher, a node attribute matcher, and a date matcher;

configuring segmentation points in the medical record text data through the target segmentation strategy model, wherein the configuration comprises the following steps:

and respectively matching each matcher with the medical record text data, determining a plurality of matching points corresponding to each matcher, and configuring the matching points as the segmentation points.

5. The medical record text data segmentation method according to claim 3, wherein the target segmentation policy model includes one of a keyword matcher, a regular expression matcher, a node attribute matcher, and a date matcher;

and matching the matcher with the medical record text data, determining one or more matching points corresponding to the matcher, and configuring the one or more matching points as the segmentation points.

6. The medical record text data segmentation method according to claim 5, wherein the matcher comprises a keyword matcher, the keyword matcher comprising one or more keywords;

matching the matcher with the medical record text data, and determining one or more matching points corresponding to the matcher, wherein the matching points comprise:

and matching each keyword with the medical record text data through the keyword matcher to obtain one or more matching points corresponding to each keyword.

7. The medical record text data segmentation method according to any one of claims 1 or 2, wherein after acquiring medical record text data in an electronic medical record system, the method further comprises:

judging whether the medical record text data is of an unstructured data type, wherein the unstructured data comprises a plain text file data type or an html data type;

and if the medical record text data is of the unstructured data type, converting the medical record text data into the semi-structured data type.

8. A medical record text data dividing apparatus, comprising:

the data acquisition module is used for acquiring medical record text data in an electronic medical record system, wherein the medical record text data is of an unstructured data type or a semi-structured data type;

the model determining module is used for determining a target segmentation strategy model from a plurality of segmentation strategy models according to the data structure type of the medical record text data, wherein the segmentation strategy model and the data structure type of the medical record text data have a mapping relation;

and the data segmentation module is used for configuring segmentation points in the medical record text data through the target segmentation strategy model and carrying out data segmentation on the medical record text data according to the segmentation points.

9. A medical record text data dividing apparatus, comprising:

the system comprises a sample data acquisition module, a data processing module and a data processing module, wherein the sample data acquisition module is used for acquiring medical record text data in an electronic medical record system and acquiring a plurality of sample text data with different data structure types from the medical record text data, and the medical record text data is of an unstructured data type or a semi-structured data type;

the model configuration module is used for acquiring an initial segmentation strategy model and modifying the initial segmentation strategy model according to the sample text data to obtain a plurality of segmentation strategy models;

and the text data segmentation module is used for determining a target segmentation strategy model in the segmentation strategy models according to the data structure type of the medical record text data, configuring segmentation points in the medical record text data through the target segmentation strategy model, and performing data segmentation on the medical record text data according to the segmentation points.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the medical record text data segmentation method according to any one of claims 1 to 7.

11. An electronic device, comprising:

one or more processors;

a storage device for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the medical record text data segmentation method as claimed in any one of claims 1 to 7.