CN112800187B

CN112800187B - Data mapping method, medical text data mapping method and device and electronic equipment

Info

Publication number: CN112800187B
Application number: CN202110398287.8A
Authority: CN
Inventors: 王东风; 方杰; 汪知滴; 周月; 纪萍
Original assignee: Beijing Kingsoft Cloud Network Technology Co Ltd
Current assignee: Beijing Kingsoft Cloud Network Technology Co Ltd
Priority date: 2021-04-14
Filing date: 2021-04-14
Publication date: 2021-06-29
Anticipated expiration: 2041-04-14
Also published as: CN112800187A

Abstract

The embodiment of the invention provides a data mapping method, a medical text data mapping device and electronic equipment, and relates to the technical field of artificial intelligence and the medical field. The data mapping method comprises the following steps: acquiring a plurality of word segments of target text data; the target text data comprises data contents under a plurality of specified fields; for each participle, determining a standard word matched with the participle and determining a designated field to which the participle belongs based on a standardized database of a plurality of designated fields; the appointed field to which the participle belongs is an appointed field to which a standard word matched with the participle belongs; and aiming at each participle, establishing a data mapping relation related to the participle according to the standard word matched with the participle and the specified field to which the participle belongs. Compared with the prior art, the method provided by the embodiment of the invention can realize standardized data mapping on the word segmentation results of the text data of different information systems, and provides a realization basis for the standardization of the text data.

Description

Data mapping method, medical text data mapping method and device and electronic equipment

Technical Field

The invention relates to the technical field of artificial intelligence and the medical field, in particular to a data mapping method, a medical text data mapping device and electronic equipment.

Background

Currently, with the continuous development of information technology, the information exchange between the interior of each industry is more and more in demand. Today, however, several information systems present a "island of information" scenario within the same industry, or within the same organization.

For text data formed by information systems, because data operation specifications adopted by the information systems are not uniform, standardized data related to the standardization of the text data cannot be obtained, which undoubtedly affects the interoperability among the information systems and the promotion of industry standardization and overall efficiency.

Taking the medical field as an example, there are many medical institution IT systems, and although standardization and unification are performed on the data field level of medical image text data, the data processing specifications adopted by the medical institution IT systems are not unified, so that fine supervision and analysis of medical quality cannot be performed.

When the text data of different information systems are normalized to obtain normalized standardized data about the text data, standardized data mapping needs to be performed on the word segmentation result of the text data to determine the obtained standardized data of each word segmentation, so that the standardized data of the text data is obtained according to the obtained standardized data of each word segmentation. That is, the data mapping for normalizing the word segmentation result of the text data is the basis for achieving the text data normalization.

Based on this, a data mapping method is needed to perform standardized data mapping on the word segmentation results of the text data of different information systems, so as to provide a basis for implementing the standardization of the text data.

Disclosure of Invention

The embodiment of the invention aims to provide a data mapping method, a data mapping device and electronic equipment, which aim to realize standardized data mapping on word segmentation results of text data of different information systems and provide a basis for realizing standardization of the text data. In addition, the embodiment of the invention also provides a medical text data mapping method, a medical text data mapping device and electronic equipment, so that standardized data mapping is carried out on word segmentation results of medical text data of different information systems, and a realization basis is provided for standardization of medical text data. The specific technical scheme is as follows:

in a first aspect, an embodiment of the present invention provides a data mapping method, where the method includes:

acquiring a plurality of word segments of target text data; the target text data comprises data contents under a plurality of specified fields, wherein the specified fields are as follows: presetting fields of a standardized database;

for each participle, determining a standard word matched with the participle based on the standardized database of the specified fields, and determining the specified field to which the participle belongs; wherein, the appointed fields of the participle are as follows: the designated field to which the standard word matched with the participle belongs;

and aiming at each participle, establishing a data mapping relation related to the participle according to the standard word matched with the participle and the specified field to which the participle belongs.

In a second aspect, an embodiment of the present invention provides a data mapping apparatus, where the apparatus includes:

the word segmentation acquisition module is used for acquiring a plurality of words of the target text data; the target text data comprises data contents under a plurality of specified fields, wherein the specified fields are as follows: presetting fields of a standardized database;

the word segmentation determining module is used for determining a standard word matched with the word segmentation and determining the designated field to which the word segmentation belongs according to each word segmentation and the standardized database of the designated fields; wherein, the appointed fields of the participle are as follows: the designated field to which the standard word matched with the participle belongs;

and the relation establishing module is used for establishing a data mapping relation related to the participle according to the standard word matched with the participle and the specified field to which the participle belongs.

In a third aspect, an embodiment of the present invention provides a medical text data mapping method, where the method includes:

acquiring a plurality of word segments of medical text data; wherein the medical text data comprises data content under a plurality of specified fields, the plurality of specified fields being: presetting fields of a standardized database;

the method for mapping data provided by the first aspect processes the multiple participles of the medical text data to establish a data mapping relation of the multiple participles.

In a fourth aspect, an embodiment of the present invention provides a medical text data mapping apparatus, where the apparatus includes:

the medical text data acquisition module is used for acquiring a plurality of word segments of the medical text data; wherein the medical text data comprises data content under a plurality of specified fields, the plurality of specified fields being: presetting fields of a standardized database;

a medical text mapping establishing module, configured to process the multiple participles of the medical text data according to any one of the data mapping methods provided in the first aspect, so as to establish a data mapping relationship regarding the multiple participles.

In a fifth aspect, an embodiment of the present invention provides an electronic device, including a processor, a communication interface, a memory, and a communication bus, where the processor and the communication interface complete communication between the memory and the processor through the communication bus;

a memory for storing a computer program;

a processor, configured to implement the steps of any one of the data mapping methods provided in the first aspect above and/or the steps of the medical text data mapping method provided in the third aspect above when executing a program stored in a memory.

In a sixth aspect, the present invention provides a computer-readable storage medium, in which a computer program is stored, and the computer program, when executed by a processor, implements the steps of any one of the data mapping methods provided in the first aspect above, and/or the steps of the medical text data mapping method provided in the third aspect above.

In a seventh aspect, an embodiment of the present invention provides a computer program product containing instructions, which when run on a computer, causes the computer to perform the steps of any one of the data mapping methods provided in the first aspect above, and/or the steps of the medical text data mapping method provided in the third aspect above.

The embodiment of the invention has the following beneficial effects:

as can be seen from the above, with the data mapping scheme provided in the embodiment of the present invention, for target text data including data contents in a plurality of designated fields, a plurality of participles of the target text data may be obtained. Furthermore, for each participle, based on the standard words in the standardized database of the plurality of specified fields, the standard word matched with the participle can be determined, and the specified field to which the participle belongs can be determined. Thus, for each participle, a data mapping relation related to the participle can be established according to the standard word matched with the participle and the specified field to which the participle belongs.

Based on this, with the data mapping scheme provided by the embodiment of the present invention, since the standard words in the standardized database of the multiple specified fields can normalize the data content of each specified field, when the data mapping relationship of each participle of the target text data is established by using the standard words in the standardized database of the multiple specified fields, the obtained data mapping relationship can be normalized. Therefore, standardized data mapping can be carried out on the word segmentation results of the text data of different information systems, and accordingly, a realization basis is provided for the standardization of the text data.

In addition, by applying the medical text data mapping scheme provided by the embodiment of the invention, standardized data mapping can be carried out on word segmentation results of the medical text data. Therefore, an implementation basis can be provided for the standardized processing of the medical text data of different information systems.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other embodiments can be obtained by using the drawings without creative efforts.

Fig. 1 is a schematic flowchart of a data mapping method according to an embodiment of the present invention;

fig. 2 is a schematic flowchart of another data mapping method according to an embodiment of the present invention;

fig. 3 is a schematic flowchart of another data mapping method according to an embodiment of the present invention;

fig. 4 is a flowchart illustrating a medical text data mapping method according to an embodiment of the present invention;

fig. 5 is a process diagram of an application example of a medical text data mapping method according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of a data mapping apparatus according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of a medical text data mapping apparatus according to an embodiment of the present invention;

fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived from the embodiments given herein by one of ordinary skill in the art, are within the scope of the invention.

In order to solve the above technical problem, an embodiment of the present invention provides a data mapping method.

The data mapping method may be applied to various electronic devices such as a server, a notebook computer, a desktop computer, and a tablet computer, and for this reason, embodiments of the present invention are not particularly limited, and are hereinafter referred to as electronic devices for short. Moreover, it is reasonable that the electronic device may be a device in a distributed system or may be an independent device.

In addition, the method can be applied to any application scenario in which a data mapping relationship needs to be established for each participle of text data, for example, a data mapping relationship is established for each participle of medical image text data in the medical field, and for example, the medical image text data may include a specified field inspection part name and an inspection method name; for another example, it is reasonable to establish a data mapping relation and the like for each word segmentation of each type of text data in the field of construction.

The medical image text data means: the medical images may include medical images formed by a plurality of services, such as a general broadcasting service, a CT (Computed Tomography) service, a Magnetic Resonance (MR) service, a color ultrasound service, and an endoscope service, where the general broadcasting service is: the X-ray camera shooting related equipment performs related business of medical photography. The examination site name means: the main human body parts and organs such as cranium, lumbar vertebrae and the like of medical image examination, and the name of the examination method refers to: the technical methods adopted in the medical image examination process include, for example, a positive position, an oblique position and the like.

In order to establish data mapping relationships of each participle of text data of different information systems and to make the obtained data mapping relationships normalized, in the scheme provided by the embodiment of the present invention, a standardized database of each designated field is pre-constructed, and each designated field is: and the fields related in the text data to be subjected to word segmentation data mapping relation establishment.

Various construction modes of the standardized data of each field exist, and the specific construction mode is not limited in the present application.

Illustratively, as shown in table 1 and table 2, the standardized database of the preset examination part name and the examination method name is a part of the contents thereof.

TABLE 1

TABLE 2

When the target text data is medical image text data in the medical field, the plurality of designated fields may include, in addition to the examination region name and the examination method name: it is reasonable to have other fields such as diagnosis results, business types, etc. and to construct a standardized database of other fields such as diagnosis results, business types, etc.

The data mapping method provided by the embodiment of the invention can comprise the following steps:

As can be seen from the above, by applying the scheme provided by the embodiment of the present invention, for target text data including data contents in a plurality of designated fields, a plurality of participles of the target text data can be obtained. Furthermore, for each participle, based on the standard words in the standardized database of the plurality of specified fields, the standard word matched with the participle can be determined, and the specified field to which the participle belongs can be determined. Thus, for each participle, a data mapping relation related to the participle can be established according to the standard word matched with the participle and the specified field to which the participle belongs.

Based on this, by applying the scheme provided by the embodiment of the present invention, since the standard words in the standardized database of the plurality of specified fields can normalize the data content of each specified field, when the data mapping relationship of each participle of the target text data is established by using the standard words in the standardized database of the plurality of specified fields, the obtained data mapping relationship can be normalized. Therefore, standardized data mapping can be carried out on the word segmentation results of the text data of different information systems, and accordingly, a realization basis is provided for the standardization of the text data.

Hereinafter, a data mapping method provided by an embodiment of the present invention is specifically described with reference to the accompanying drawings.

Fig. 1 is a schematic flow chart of a data mapping method according to an embodiment of the present invention, as shown in fig. 1, the method may include the following steps S101 to S103:

s101: acquiring a plurality of word segments of target text data;

the target text data comprises data contents under a plurality of specified fields, and the specified fields are as follows: presetting fields of a standardized database;

for target text data to be normalized, a plurality of participles of the target text data may be obtained.

The target text data comprises data contents under a plurality of specified fields, and a standardized database is preset in the specified fields. And each standard word under the specified field can be recorded in the standardized database of the specified field.

For example, the target text data may be medical image text data, and the plurality of designated fields include data contents under the examination part name and the examination method name. Illustratively, the target text data is: and (3) determining that a plurality of word segments of the target text data are: lumbar vertebrae and the right lateral position.

It is reasonable that the multiple participles of the target text data determined by other electronic devices can be directly obtained, or the target text data can be obtained first, and then the target text data is subjected to data processing to obtain the multiple participles of the target text data.

Optionally, the target text data may be segmented based on a predetermined segmentation symbol to obtain each initial sub-text, then text correction is performed on each initial sub-text according to a predetermined text correction rule to obtain each target sub-text, and finally word segmentation processing is performed on each target sub-text to obtain a plurality of word segments of the target text data.

Optionally, word segmentation processing may be performed on the target text data to obtain a plurality of words included in the word segmentation result, and the plurality of words are used as a plurality of words of the target text data.

The technical field of a word segmentation tool used for performing word segmentation processing on target text data is the same as that of a plurality of designated fields; further, since the technical fields to which the plurality of specified fields belong are the same as those to which the target text data belongs, that is, the technical field to which the segmentation tool used for performing the segmentation processing on the target text data belongs is the same as that to which the target text data belongs.

For example, if the target text data is medical image text data in the medical field, a word segmentation tool dedicated for medical use may be used to perform word segmentation on the target text data to obtain a plurality of words of the target text data.

Optionally, semantic analysis may be performed on the target text data to obtain an analysis result, and a plurality of participles of the target text data may be determined according to the analysis result.

S102: for each participle, determining a standard word matched with the participle and determining a designated field to which the participle belongs based on a standardized database of a plurality of designated fields;

wherein, the appointed fields of the participle are as follows: the designated field to which the standard word matched with the participle belongs;

after the multiple participles of the target text data are obtained, each participle is not necessarily a standard word in the standardized database of the multiple specified fields, so that for each participle, a standard word matched with the participle can be determined based on the standard word in the standardized database of the multiple specified fields, and after the standard word matched with the participle is determined, the specified field to which the standard word matched with the participle belongs can be determined as the specified field to which the participle belongs.

S103: and aiming at each participle, establishing a data mapping relation related to the participle according to the standard word matched with the participle and the specified field to which the participle belongs.

Aiming at each participle of the target text data, after a standard word matched with the participle and a specified field to which the participle belongs are obtained, a data mapping relation related to the participle can be established according to the standard word matched with the participle and the specified field to which the participle belongs.

Optionally, the process of establishing the data mapping relationship of each participle may include: and establishing a mapping relation between the standard word matched with the word segmentation and the designated field to which the word segmentation belongs as a data mapping relation related to the word segmentation.

For example, the target text data is medical image text data, and the plurality of designated fields may include: the inspection site name and the inspection method name. For example, the target text data is "hip joint true position", and the plurality of participles of the target text data are "hip joint" and "true position", respectively, the data mapping relation for each participle of the target text data may be as shown in table 3:

TABLE 3

Optionally, in a specific implementation manner, in the step S102, determining the standard word matching the segmented word based on the standardized database of the plurality of specified fields may include the following steps a1-a 2.

Step A1: traversing a plurality of standardized databases of specified fields, and matching the participle with each reference word corresponding to each standardized database when traversing each standardized database; if there is a reference word matching the participle, performing step a 2; if there is no reference word matching the segmented word, then the next normalized database is traversed.

Wherein, each reference word corresponding to the standardized database comprises: each standard word in the standardized database and/or each similar meaning word associated with each standard word in the standardized database;

step A2: and determining the standard word corresponding to the reference word in the standardized database as the standard word matched with the participle.

In this specific implementation manner, for each participle of the target text data, when determining the standard word matched with the participle, the standardized databases of the plurality of designated fields may be sequentially traversed according to a preset traversal order. Further, when traversing to each standardized database, the participle may be matched with each reference word corresponding to the standardized database.

Wherein, each reference word corresponding to each standardized database may include: each standard word in the standardized database and/or each similar meaning word associated with each standard word in the standardized database.

That is, the participle may be matched to each standard word in each standardized database as it is traversed to the standardized database; or matching the word segmentation with each similar meaning word associated with each standard word in the standardized database; the participle may also be matched against each standard word in the standardized database and each near-synonym associated with each standard word in the standardized database.

Furthermore, when traversing to a certain standardized database, if a reference word matched with the participle exists in each reference word corresponding to the standardized database, the standard word corresponding to the reference word matched with the participle can be determined, and the determined standard word is the standard word matched with the participle.

If the reference word matched with the participle is a certain standard word in the standardized database, the reference word is the standard word matched with the participle; if the reference word matched with the participle is a similar meaning word associated with a standard word in the standardized database, the standard word associated with the similar meaning word can be determined first, and then the standard word associated with the similar meaning word is determined as the standard word matched with the participle.

Correspondingly, when traversing to a certain standardized database, if the reference word matched with the participle does not exist in each reference word corresponding to the standardized database, it can be said that the standardized database cannot be used to obtain the standard word matched with the participle, and then the next standardized database can be traversed.

The traversal order of the standardized database of each designated field may be determined according to the requirements of practical applications, and the like, which is not specifically limited in the embodiments of the present invention.

Optionally, the participle may be subjected to data matching with each reference word corresponding to the standardized database, and the matching degree between the participle and each reference word corresponding to the standardized database is calculated, so as to determine whether a matching degree greater than a preset threshold exists in each obtained matching degree, and if not, it may be determined that no reference word matching with the participle exists in each reference word corresponding to the standardized database. If the reference word exists, the reference word with the matching degree larger than the preset threshold value can be determined to be the reference word matched with the participle.

Optionally, if there are multiple reference words with a matching degree greater than a preset threshold, the reference word with the highest matching degree among the multiple reference words may be determined as the reference word matched with the participle.

Optionally, if there are a plurality of reference words with a matching degree greater than a preset threshold, the plurality of reference words may be determined as reference words matching the segmented word, and then there are a plurality of standard words matching the segmented word.

In addition, the preset threshold may be set according to a requirement of practical application, and the embodiment of the present invention is not particularly limited.

Specifically, the matching degree between the segmented word and each reference word corresponding to the standardized database may be calculated by using a preset near-meaning word model.

Wherein, the similar meaning word model is as follows: the model obtained by training with the specified corpus data is as follows: text data associated with a plurality of specified fields.

In order to improve the accuracy of each matching degree calculated by using the similar meaning word model, in this specific implementation, the specified corpus data used for training the similar meaning word model may be text data related to the plurality of specified fields.

Specifically, text data related to a plurality of designated fields may be acquired as designated corpus data, and model training may be performed using the designated corpus data to obtain a near-synonym model. And after the similar meaning word model is obtained, the matching degree of the participle and each reference word corresponding to the standardized database is utilized by the similar meaning word model, so that whether the reference word matched with the participle exists or not is determined.

Specifically, for each segmented word, the segmented word and each reference word corresponding to the standardized database may be input into the similar meaning word model, so that the similar meaning word model calculates a distance between the segmented word and each reference word corresponding to the standardized database, and the distance may be used as a matching degree. Wherein, the closer the distance, the more matched the two words are characterized, and the farther the distance, the more unmatched the two words are characterized.

In a specific application, the technical field to which the specified corpus data belongs may be the same as the technical field to which the target text data belongs.

For example, if the target text data is medical image text data in the medical field, the specified corpus data may be medical corpus, and the medical corpus may include text data related to medical images, such as a plurality of image examination reports and a plurality of image diagnosis reports.

Also, for example, the training process for the synonym model may include: after the specified corpus data is obtained, word segmentation processing can be performed on the obtained specified corpus data to obtain a plurality of segmented words related to the specified corpus data, so that a segmented word data set formed by the obtained plurality of segmented words related to the specified corpus data is used as a training set, a preset initial language model is trained, and the near-synonym model is obtained.

For example, the initial language model may be: word2vec (Word to vector) model, glove (glove Vectors for Word retrieval) model, ELMO (embedding from Langeus models), GPT (imaging mapping by general genetic Pre-Training) model, BERT (Pre-Training of Deep biological transformation for Langeus rendering) model, etc.

The technical field of the segmentation tool used for performing the segmentation processing on the specified corpus data may be the same as the technical field of the specified corpus data.

Optionally, in a specific implementation manner, the plurality of designated fields may include a first-class field, and each standard word belonging to the first-class field may be ranked according to a dependency relationship.

For example, for medical image text data in the medical field, the plurality of designated fields may include: the method comprises the following steps of checking a part name and a checking method name, wherein for each standard word belonging to the checking part name, the standard words can be classified according to the subordination relation among the standard words, and therefore the checking part name is a first-class field.

Exemplarily, as shown in table 1, craniofacial and facial features, craniocerebral, internal auditory meatus, saddle area, orbital, maxilla, temporomandibular joint, mastoid, styloid, nasopharynx, and nasal bone are all standard words belonging to the name of the examination site, wherein craniofacial and facial features, internal auditory meatus, saddle area, orbital, maxilla, temporomandibular joint, mastoid, styloid, nasopharynx, and nasal bone are all subordinate to craniofacial and facial features. The craniofacial and facial features may be classified as first-level examination site names, while the craniocerebral, internal auditory meatus, saddle area, orbit, maxillofacial bone, temporomandibular joint, mastoid, styloid process, nasopharynx and nasal bone are all second-level examination site names of the craniofacial and facial features.

Based on this, in this specific implementation manner, as shown in fig. 2, the data mapping method provided in the embodiment of the present invention may further include the following step S104:

s104: determining the belonging grade of a standard word matched with each participle as a participle of a first class field aiming at each belonging specified field, and taking the grade as the first mapping content of the participle, and determining the standard word under the target grade corresponding to the standard word matched with the participle as the second mapping content of the participle when the belonging grade of the standard word matched with the participle is not the highest grade;

wherein, the target grade is higher than the grade of the standard word matched with the participle;

accordingly, in this specific implementation manner, the step S103, for each participle, establishing a data mapping relationship regarding the participle according to the standard word matched with the participle and the specified field to which the participle belongs, may include the following step S103A:

S103A: aiming at each participle, if the specified field to which the participle belongs is a first-class field and has a first mapping content and a second mapping content, establishing a data mapping relation related to the participle according to a standard word matched with the participle, the specified field to which the participle belongs, the first mapping content of the participle and the second mapping content of the participle; if the designated field to which the participle belongs is a first-class field and only has first mapping content, establishing a data mapping relation related to the participle according to the standard word matched with the participle, the designated field to which the participle belongs and the first mapping content of the participle; otherwise, establishing a data mapping relation related to the participle according to the standard word matched with the participle and the specified field to which the participle belongs.

In this specific implementation manner, for each participle whose designated field belongs to is a first-class field, the belonging level of the standard word matched with the participle in the first-class field may be determined, and the determined belonging level is used as the first mapping content of the participle.

Furthermore, if the level of the standard word matching the segmented word is not the highest level, the standardized database of the specified field is: the standardized database having the hierarchical relationship in the designated field can determine a target hierarchical level higher than the hierarchical level of the standard word matching the participle in the hierarchical relationship in the standardized database of the designated field, and further determine the standard word in the standardized database of the designated field at the target hierarchical level corresponding to the standard word matching the participle, and use the determined standard word as the second mapping content of the participle.

It can be understood that, when the level of the standard word matched with the participle is not the highest level, the standard word matched with the participle may be subordinate to a certain standard word with a target level, and therefore, the certain standard word with the target level to which the standard word matched with the participle belongs is the standard word under the target level corresponding to the standard word matched with the participle, that is, the second mapping content of the participle.

For example, for medical image text data in the medical field, the plurality of designated fields may include: the inspection site name and the inspection method name, wherein the inspection site name is a first type field, and the table 1 is a partial content of a standardized database of the inspection site name.

Illustratively, when a standard word matched with a certain word segmentation of the target text data is a standard word cranium in a standardized database of the name of the examination part, the grade of the cranium can be determined to be two levels, and first mapping content of the word segmentation is obtained; then, the grade of the cranium can be determined to be the non-highest grade; then, the first-level standard words corresponding to the cranium and the brain can be determined to be the cranium face and the five sense organs, and the second mapping content of the word segmentation is obtained.

In addition, for a participle whose each belonging field is a first-class field, the contents of the respective classes lower than the belonging class of the standard word matching the participle can be considered to be empty.

Thus, for each participle, if the specified field to which the participle belongs is the first-class field and has the first mapping content and the second mapping content, the data mapping relationship related to the participle can be established according to the standard word matched with the participle, the specified field to which the participle belongs, the first mapping content of the participle and the second mapping content of the participle.

For each participle, if the specified field to which the participle belongs is a first-class field and has only first mapping content, a data mapping relation related to the participle can be established according to the standard word matched with the participle, the specified field to which the participle belongs and the first mapping content of the participle.

Correspondingly, for each participle, if the specified field to which the participle belongs is a non-first-class field, the data mapping relation related to the participle can be established directly according to the standard word matched with the participle and the specified field to which the participle belongs.

Optionally, in a specific implementation manner, in the step S103A, establishing a data mapping relationship regarding the participle according to the standard word matched with the participle, the specified field to which the participle belongs, the first mapping content of the participle, and the second mapping content of the participle may include the following step B:

and B: establishing a first mapping relation between a standard word matched with the participle, a designated field to which the participle belongs and first mapping content of the participle; establishing a second mapping relation between each standard word in the second mapping content of the participle and the designated field to which the standard word belongs and the target level to which the standard word belongs; and determining the established first mapping relation and the second mapping relation as the data mapping relation related to the participle.

That is, for each participle, if the specified field to which the participle belongs is a first-class field and has a first mapping content and a second mapping content, a first mapping relationship between the standard word matched with the participle, the specified field to which the participle belongs, and the first mapping content may be established, and a second mapping relationship between each standard word in the second mapping content and the class to which the standard word belongs may be established, so that the established first mapping relationship and the established second mapping relationship may be determined as the data mapping relationship about the participle.

Optionally, in a specific implementation manner, in the step S103A, establishing a data mapping relationship regarding the participle according to the standard word matched with the participle, the specified field to which the participle belongs, and the first mapping content of the participle, may include the following step C:

and C: and establishing a mapping relation between the standard word matched with the word segmentation and the designated field to which the word segmentation belongs and the first mapping content of the word segmentation, and determining the established mapping relation as a data mapping relation related to the word segmentation.

That is to say, for each participle, if the specified field to which the participle belongs is the first type field and has only the first mapping content, then a mapping relationship between the standard word matched with the participle, the specified field to which the participle belongs, and the first mapping content may be established, so as to obtain the data mapping relationship regarding the participle.

For example, for medical image text data in the medical field, the plurality of designated fields may include: and checking the name of the part and the name of the checking method, wherein the name of the checking part is the first type field.

Illustratively, the target text data is "hip orthostatic", and the respective participles of the target text data are: "hip joint" and "orthostatic"; the standard words matched with the participles are respectively as follows: the term "hip joint" and "orthostatic", and the designated field to which the term "hip joint" belongs is the examination part name, and the first mapping content of the participle "hip joint" is: second level, the second mapping content of the participle "hip joint" is: if the designated field to which the word "correct position" belongs is the name of the examination method, "the data mapping relationship for the word" hip joint "and the data mapping relationship for the word" correct position "shown in table 4 can be obtained.

TABLE 4

For another example, the target text data is "chest side position", and the respective participles of the target text data are: "thoracic" and "right lateral"; the standard words matched with the participles are respectively as follows: the "chest" and the "right side position", and the designated field to which the "chest" belongs is the examination part name, and the first mapping content of the participle "chest" is: at the first level, the word "chest" has no second mapping content, and the designated field to which the word "front side" belongs is the name of the inspection method, so that the data mapping relation about the word "chest" and the data mapping relation about the word "front side" shown in table 5 can be obtained.

TABLE 5

Optionally, in a specific implementation manner, since the standard database of each specified field may be traversed to determine the standard word matching each participle, in the present specific implementation manner, in the step a1, when traversing each standardized database, matching the participle with each reference word corresponding to the standardized database may include the following steps a11-a 12:

a11: when traversing to a standardized database of the first-class field, determining each reference word with the lowest belonging grade from each reference word which is not matched with the participle and corresponds to the standardized database, and matching the participle with each determined reference word;

a12: when a standardized database of specified fields except the first-class fields is traversed, matching the participle with each reference word corresponding to the standardized database respectively;

correspondingly, in this specific implementation manner, the data mapping method provided in the embodiment of the present invention may further include the following step D:

step D: when traversing to the standardized database of the first type field, if there is no reference word matching the participle, the step returns to step a11 before traversing the next standardized database.

When each reference word corresponding to the standardized database includes each similar meaning word associated with each standard word in the standardized database, the rank of each similar meaning word is: the grade of the standard word associated with each similar meaning word. That is, a standard word has the same rank as the related similar meaning word associated with the standard word.

In this way, in this specific implementation manner, when traversing to the standardized database of the first-class field, each reference word with the lowest belonging rank may be determined from each reference word that is not matched with the participle and corresponds to the standardized database.

For example, for a standardized database of a first-class field, when the step a11 is executed for the first time, each reference word belonging to the lowest class of the standardized database may be determined from the reference words corresponding to the standardized database; when the step a12 is executed for the second time, since it is determined that the reference word whose belonging level is the lowest level of the standardized database is matched with the reference word corresponding to the normalized database, the reference word whose belonging level is the previous level of the lowest level of the standardized database can be determined from the reference words corresponding to the standardized database. And analogizing in sequence until a reference word matched with the participle is determined, or determining that no reference word matched with the participle exists, and traversing the next standardized database.

Wherein, the reference words of the upper level of the lowest level of the standardized database are: and determining each reference word with the lowest belonging grade in each reference word which is not matched with the participle and corresponds to the standardized database.

Based on this, each time step a11 is executed, the reference word with the lowest rank can be determined from the reference words corresponding to the standardized database and not matched with the participle.

Illustratively, the standardized database of the first type of field has a first level, a second level, and a third level, respectively, wherein the first level is the highest level and the third level is the lowest level.

For the standardized database of the first-class field, when traversing to the standardized database, determining each reference word with a third-class belonging level from each reference word corresponding to the standardized database; if the reference word matched with the participle does not exist in the reference words with the third grade, each reference word with the second grade can be further determined from each reference word corresponding to the standardized database; if the reference word matched with the participle still does not exist in each reference word with the second-level belonging grade, each reference word with the first-level belonging grade can be further determined from each reference word corresponding to the standardized database.

That is, when the normalized data of the first-class field is traversed, the reference words with the lowest belonging rank may be determined from the reference words that are not matched with the participle and correspond to the normalized database, and the participle may be matched with the determined reference words.

In this way, when there is no reference word matching the word segmentation in the determined reference words with the lowest belonging rank, the step of determining the reference words with the lowest belonging rank from the reference words not matching the word segmentation corresponding to the standardized database may be performed again, so that the word segmentation is matched with the re-determined reference words with the lowest belonging rank.

And then, analogizing in sequence until a reference word matched with the participle is determined, further determining the first mapping content and the second mapping content of the participle according to the belonging grade of the reference word matched with the participle, or determining the first mapping content of the participle according to the belonging grade of the reference word matched with the participle until the reference word matched with the participle is determined.

Or, until it is determined that, in the reference words corresponding to the standardized database, the reference word matching the participle still does not exist in the reference words belonging to the highest level of the standardized database, it may be stated that the reference word matching the participle does not exist in the reference words corresponding to the standardized database, so that the next standardized database may be traversed.

When the standardized database of the specified field except the first-type field is traversed corresponding to the standardized database of the first-type field, the participle can be directly matched with each reference word corresponding to the standardized database.

Optionally, in a specific implementation manner, the plurality of designated fields may include a second-type field, and a standard word having at least one specific relation word exists in each standard word belonging to the second-type field; specific relational terms include: including relational terms and alternative relational terms.

That is, among the respective standard words belonging to the second-class field, there is a standard word having a related word and/or a replacement related word.

For example, for medical image text data in the medical field, the plurality of designated fields may include: the method includes checking a part name and a method name, wherein, for each standard word belonging to the method name, there is a standard word having a relation word and/or a replacement relation word, and thus, the method name is the second type field.

Illustratively, table 2 above is part of the contents of the standardized database of examination method names, and as shown in table 2, the double-diagonal has a left diagonal and a right diagonal containing the relation term, and replacing the relation term; the left oblique position and the right oblique position both have oblique positions containing relation words; the positive side position has the side position containing the relation word, and replaces the relation word positive position and the side position.

Based on this, in this specific implementation manner, as shown in fig. 3, the data mapping method provided in the embodiment of the present invention may further include the following step S105:

s105: aiming at each participle of which the designated field belongs to is a second-class field, obtaining a corrected standard word matched with the participle based on the number of the standard words matched with the participle and the determination result of whether the standard word matched with the participle has a replacement related word or not;

the corrected standard words matched with the participles are as follows: the standard word matched with the participle or the replacement relation word of the standard word matched with the participle.

Accordingly, in this specific implementation manner, the step S103, for each participle, establishing a data mapping relationship regarding the participle according to the standard word matched with the participle and the specified field to which the participle belongs, may include the following step S103B:

S103B: aiming at each participle, if the specified field to which the participle belongs is a second-class field, establishing a data mapping relation related to the participle according to the corrected standard word matched with the participle and the specified field to which the participle belongs; otherwise, establishing a data mapping relation related to the participle according to the standard word matched with the participle and the specified field to which the participle belongs.

In this specific implementation manner, for each participle whose designated field belongs to is a second-class field, the modified standard word matching the participle may be obtained according to the determined number of standard words matching the participle and the determination result of whether the standard word matching the participle has a replacement related word.

If one standard word matched with the participle has a replacement relation word, the replacement word is the corrected standard word matched with the participle; otherwise, the standard word matched with the participle can be directly used as the corrected standard word matched with the participle.

Thus, for each participle, if the specified field to which the participle belongs is a second-class field, establishing a data mapping relation related to the participle according to the corrected standard word matched with the participle and the specified field to which the participle belongs; otherwise, establishing a data mapping relation related to the participle according to the standard word matched with the participle and the specified field to which the participle belongs.

Optionally, if the designated field to which the participle belongs is a second-type field, a modified mapping relationship between the standard word matched with the participle and the designated field to which the participle belongs may be established as a data mapping relationship of the participle; otherwise, a mapping relationship between the standard word matched with the participle and the designated field to which the participle belongs can be established as a data mapping relationship about the participle.

Optionally, in a specific implementation manner, in the step S105, obtaining the modified standard word matching the participle based on the number of the standard words matching the participle and the determination result of whether the standard word matching the participle has the replacement related word, may include the following steps E1-E6:

step E1: if the number of the standard words matched with the participle is one and the standard words matched with the participle do not have the replacement relation words, determining the standard words matched with the participle as the modified standard words matched with the participle;

step E2: if the number of the standard words matched with the participle is one and the standard words matched with the participle have replacement relation words, determining the replacement relation words of the standard words matched with the participle as modified standard words matched with the participle;

step E3: if the number of the standard words matched with the participle is multiple and a target standard word exists, deleting the target standard word and judging whether each standard word which is matched with the participle currently exists has a replacement relation word; if so, perform step E4; if not, go to step E5;

wherein, the target standard words are: the relation-containing word of at least one standard word in the plurality of standard words matched with the participle;

step E4: determining the replacement relation words of the standard words matched with the participles as the modified standard words matched with the participles;

step E5: determining the standard word matched with the participle as the modified standard word matched with the participle;

step E6: if the number of the standard words matched with the participle is multiple and no target standard word exists, executing step E3 to determine whether each currently existing standard word matched with the participle exists a step of replacing the relation word.

In this specific implementation, for each participle belonging to which the designated field is the second-type field, the number of standard words matching the participle may be determined first.

If the number of the standard words matched with the participle is 1, whether the standard words matched with the participle have the replacement relation words or not can be further judged.

If the standard word matched with the participle has the replacement relation word, the replacement relation word of the standard word matched with the participle can be determined as the modified standard word matched with the participle.

If the standard word matched with the participle does not have the replacement relation word, the standard word matched with the participle can be directly determined as the modified standard word matched with the participle.

If the number of the standard words matched with the participle is multiple, whether a target standard word exists in the multiple standard words matched with the participle can be further judged.

Wherein, the target standard words are: and at least one standard word in the plurality of standard words matched with the participle contains the relation word. That is, it is determined whether there is a word containing a related word that can be at least one of the plurality of standard words matching the participle among the plurality of standard words matching the participle.

When a target standard word exists in the plurality of standard words matched with the participle, the target standard word can be deleted, and whether the currently existing standard words matched with the participle have the replacement relation words or not is further judged after the target word is deleted.

If each standard word which is matched with the participle currently exists, the replacement relation word of each standard word which is matched with the participle currently exists can be determined as the modified standard word which is matched with the participle. If each standard word which exists currently and is matched with the participle does not have the replacement relation word, each standard word which exists currently and is matched with the participle can be directly determined as the modified standard word which is matched with the participle.

When the target standard word does not exist in the plurality of standard words matched with the participle, whether the replacement relation word exists in each standard word matched with the participle can be further judged.

If each standard word matched with the participle has a replacement relation word, the replacement relation word of each standard word matched with the participle can be determined as the modified standard word matched with the participle. If each standard word matched with the participle has no replacement relation word, each standard word matched with the participle can be directly determined as the modified standard word matched with the participle.

For example, for medical image text data in the medical field, the plurality of designated fields may include: the name of the inspection part and the name of the inspection method, wherein the name of the inspection method is the second type field, and the table 2 is a part of the contents of the standard database of the names of the inspection methods.

Illustratively, the target text data is lumbar vertebra positive side position, the participles of the target text data are "lumbar vertebra" and "positive side position", respectively, and the standard words matched with the participle "positive side position" are: the terms "positive side" and "side" are used herein, wherein the standard word "side" is an inclusive relation word of the standard word "positive side", and the standard word "positive side" has the alternative relation words "positive side" and "side", so that the data mapping relation regarding the participle "positive side" as shown in table 6 can be obtained.

TABLE 6

In connection with the above embodiments, for example, for the text data of medical images in the medical field, the plurality of designated fields may include: and checking the part name and the checking method name, wherein the checking part name is a first type field, and the checking method name is a second type field. Taking the above target text data "lateral position of lumbar vertebrae" as an example, a data mapping relation with respect to each participle can be obtained as shown in table 7.

TABLE 7

Corresponding to the data mapping method provided by the embodiment of the invention, the embodiment of the invention also provides a medical text data mapping method.

The method may be applied to various electronic devices such as a server, a notebook computer, a desktop computer, a tablet computer, and the like, and the embodiment of the present invention is not particularly limited, and is hereinafter referred to as an electronic device for short. Moreover, it is reasonable that the electronic device may be a device in a distributed system or may be an independent device.

In addition, the method can be applied to any application scenario in which it is necessary to establish a data mapping relationship between the respective segments of the medical text data in the medical field, for example, it is reasonable to establish a data mapping relationship between the respective segments of the medical image text data including the examination region name and the examination method name.

Fig. 4 is a flowchart illustrating a medical text data mapping method according to an embodiment of the present invention, and as shown in fig. 4, the method may include the following steps:

s401: acquiring a plurality of word segments of medical text data;

the medical text data comprises data contents under a plurality of specified fields, and the specified fields are as follows: presetting fields of a standardized database; also, the medical text data may include medical image text data, but is not limited thereto.

S402: the method for mapping medical text data according to the first aspect of the present invention is a method for mapping medical text data, which includes processing a plurality of segmentation words of the medical text data to establish a data mapping relation with respect to the plurality of segmentation words.

In normalizing the medical text data, the electronic device may first perform a normalized data mapping on the segmentation results of the medical text data. Based on this, the electronic device may first obtain a plurality of participles of the medical text data containing data content under a plurality of specified fields, and the plurality of specified fields are fields preset with a standardized database.

For example, a plurality of segments of medical image text data including an examination region name and an examination method name are acquired, and a standardized database is preset for each of the examination region name and the examination method name.

In this way, the electronic device may perform data mapping on the multiple participles of the acquired medical text data by using any one of the data mapping methods provided in the above embodiments of the present invention, so as to establish a data mapping relationship regarding the multiple participles of the acquired medical text data.

The electronic device processes the multiple participles of the acquired medical text data by using any data mapping method provided in the above embodiment of the present invention, and establishes a data mapping relationship between the multiple participles of the acquired medical text data, which is the same as the content of each implementation manner in the data mapping method provided in the above embodiment of the present invention, and is not described herein again.

Thus, in the medical field, by applying the medical text data mapping scheme provided by the embodiment of the invention, standardized data mapping can be performed on the word segmentation result of the medical text data. Therefore, an implementation basis can be provided for the standardized processing of the medical text data of different information systems.

In order to facilitate understanding of the medical text data mapping method provided by the embodiment of the present invention, as shown in fig. 5, a medical image text data in the medical field is taken as an example, and a description is given to the medical text data mapping method provided by the embodiment of the present invention. Wherein the plurality of specified fields may include: the method comprises the following steps of checking a part name and a checking method name, wherein standardized databases are preset for the checking part name and the checking method name, the checking part name is a first type field, and the checking method name is a second type field.

Step 1: and obtaining a word segmentation result of the examination name, namely obtaining a plurality of words of the medical text data, and further, aiming at each word segmentation, carrying out secondary examination part name mapping by utilizing each standard word with the second-level belonging grade in the standardized database of the examination part name and each near-meaning word associated with each standard word with the second-level belonging grade in the standardized database of the examination part name, namely determining whether the standard word with the second-level belonging grade matched with the word segmentation exists.

If the mapping is successful, namely a standard word which is matched with the participle and belongs to the level of two is existed, determining the standard word matched with the participle, determining the first mapping content of the participle, and entering the step 2;

and if the mapping fails, namely, the standard word which is matched with the participle and belongs to the level two does not exist, the step 3 is carried out.

Wherein, the second-level examination part name and the near word dictionary in fig. 5 are: and each similar meaning word associated with each standard word with the second grade in the standardized database of the examination part name.

Step 2: and performing primary examination part name mapping by utilizing the mapping relation between each standard word with the secondary level and each standard word with the primary level in the standardized database of the examination part names, namely determining the second mapping content of the participle.

And step 3: and performing primary examination part name mapping by using each standard word with the primary belonging level in the standardized database of the examination part names and each similar meaning word associated with each standard word with the primary belonging level in the standardized database of the examination part names, namely determining whether the standard word with the primary belonging level matched with the participle exists.

Wherein, the first-level examination part name and the near word dictionary in fig. 5 are: each standard word with the first level of the belonging level in the standardized database of the examination part names, and each similar meaning word associated with each standard word with the first level of the belonging level in the standardized database of the examination part names.

And 4, step 4: and when the standard words matched with the participles do not exist in the standard words in the standard database of the names of the inspection parts, mapping the names of the inspection methods by using the standard words in the standard database of the names of the inspection methods and the similar synonyms associated with the standard words in the standard database of the names of the inspection methods to obtain a standard word list of the names of the inspection methods matched with the participles.

Further, a corrected list of standard words of the examination method names matching the segmented word is obtained by using the replacement related word which each standard word in the standardized database of the examination method names has and which includes the related word and each standard word in the standardized database of the examination method names.

The checking method name and the word field of the synonym in fig. 5 are as follows: checking each standard word in the standardized database of the method names and each similar meaning word associated with each standard word in the standardized database of the method names; the checking method name includes a relation dictionary as follows: each standard word in the standardized database of the inspection method name has a relation-containing word; the checking method name alternative relation dictionary is as follows: each standard word in the standardized database of the inspection method names has a replacement relation word.

It should be noted that, in the embodiment of the present invention, the execution sequence of step 1 and step 4 is not limited.

Illustratively, with the process diagram shown in fig. 5, a plurality of participles of each target text data in table 8 are processed, and a data mapping relationship regarding each participle of each target text data is obtained as shown in table 9.

TABLE 8

For the multiple participles "chest" and "positive side" of the first piece of target text data in table 8, step 1 fails to be executed, step 2 is not executed, step 3 is executed to obtain an execution result "chest", then step 4 is executed to obtain an execution result "positive side" and "side" and finally a data mapping relationship shown as the first piece of mapping result data in table 9 is obtained.

In the execution process of the step 4, the standard words "positive side position" and "side position" matched with the participle "positive side position" are obtained first, and then the "side position" is deleted, and the "positive side position" is replaced by the replacement relation words "positive position" and "side position".

Executing step 1 for a plurality of word segments "thoracic vertebrae" and "double-oblique positions" of the second piece of target text data in table 8 to obtain an execution result "thoracic vertebrae", further executing step 2 to obtain an execution result "chest", without executing step 3; then, step 4 is executed to obtain the execution results "left-slope bit" and "right-slope bit", and finally obtain the data mapping relationship shown as the second piece of mapping result data in table 9.

In the execution process of the step 4, the standard words "double-inclined position" and "inclined position" matched with the participle "double-inclined position" are obtained first, then the "inclined position" is deleted, and the "double-inclined position" is replaced by the replacement relation words "left inclined position" and "right inclined position".

TABLE 9

Corresponding to the data mapping method provided by the embodiment of the invention, the embodiment of the invention also provides a data mapping device.

Fig. 6 is a schematic structural diagram of a data mapping apparatus according to an embodiment of the present invention, and as shown in fig. 6, the apparatus may include the following modules:

a word segmentation obtaining module 610, configured to obtain multiple word segments of the target text data; the target text data comprises data contents under a plurality of specified fields, wherein the specified fields are as follows: presetting fields of a standardized database;

a participle determining module 620, configured to determine, for each participle, a standard word matching the participle based on the standardized database of the multiple specified fields, and determine a specified field to which the participle belongs; wherein, the appointed fields of the participle are as follows: the designated field to which the standard word matched with the participle belongs;

the relationship establishing module 630 is configured to, for each participle, establish a data mapping relationship regarding the participle according to the standard word matched with the participle and the specified field to which the participle belongs.

Optionally, in a specific implementation manner, the word segmentation determining module 620 includes:

the database traversal submodule is used for traversing the standardized databases of the designated fields and matching the participle with each reference word corresponding to each standardized database when traversing each standardized database; wherein, each reference word corresponding to the standardized database comprises: each standard word in the standardized database and/or each similar meaning word associated with each standard word in the standardized database; if the reference word matched with the participle exists, triggering a participle determining submodule; if the reference word matched with the participle does not exist, traversing the next standardized database;

and the participle determining submodule is used for determining the standard word corresponding to the reference word in the standardized database as the standard word matched with the participle.

Optionally, in a specific implementation manner, the multiple designated fields include: a first type field; the standard words in the standardized database of the first type field can be graded according to the subordination relationship; the device further comprises:

a content determining module, configured to determine, before establishing a data mapping relationship regarding each participle according to a standard word matched with the participle and a specified field to which the participle belongs, a belonging level of the standard word matched with the participle as a first mapping content of the participle for each participle to which the specified field belongs, and determine, as a second mapping content of the participle, a standard word at a target level corresponding to the standard word matched with the participle when the belonging level of the standard word matched with the participle is a non-highest level; wherein, the target grade is higher than the grade of the standard word matched with the participle;

the relationship establishing module 630 is specifically configured to: aiming at each participle, if the specified field to which the participle belongs is the first-class field and has a first mapping content and a second mapping content, establishing a data mapping relation related to the participle according to a standard word matched with the participle, the specified field to which the participle belongs, the first mapping content of the participle and the second mapping content of the participle; if the designated field to which the participle belongs is the first type field and only has first mapping content, establishing a data mapping relation related to the participle according to the standard word matched with the participle, the designated field to which the participle belongs and the first mapping content of the participle; otherwise, establishing a data mapping relation related to the participle according to the standard word matched with the participle and the specified field to which the participle belongs.

Optionally, in a specific implementation manner, the database traversal sub-module is specifically configured to:

when traversing to the standardized database of the first-class field, determining each reference word with the lowest belonging grade from each reference word which is not matched with the participle and corresponds to the standardized database, and matching the participle with each determined reference word; when traversing the standardized database of the first type field, if no reference word matched with the participle exists, before the step of traversing the next standardized database, returning to the step of determining each reference word with the lowest belonging grade in each reference word which is not matched with the participle and corresponds to the standardized database;

and when traversing to the standardized database of the specified fields except the first-class fields, matching the participle with each reference word corresponding to the standardized database respectively.

Optionally, in a specific implementation manner, the relationship establishing module 630 is specifically configured to:

aiming at each participle, if the specified field to which the participle belongs is the first-class field and has a first mapping content and a second mapping content, establishing a first mapping relation between a standard word matched with the participle and the specified field to which the participle belongs and the first mapping content of the participle; establishing a second mapping relation between each standard word in the second mapping content of the participle and the designated field to which the standard word belongs and the target level to which the standard word belongs; determining the established first mapping relation and the second mapping relation as a data mapping relation related to the participle;

and aiming at each participle, if the specified field to which the participle belongs is the first-class field and only has first mapping content, establishing a mapping relation between a standard word matched with the participle and the specified field to which the participle belongs and the first mapping content of the participle, and determining the established mapping relation as a data mapping relation related to the participle.

Optionally, in a specific implementation manner, the multiple designated fields include: a second type field, and a standard word with at least one specific relation word exists in each standard word in the standardized database of the second type field; the specific relation words comprise: including relation words and replacement relation words; the device further comprises:

a standard word correction module, configured to, before establishing a data mapping relationship regarding each participle according to the standard word matched with the participle and the specified field to which the participle belongs for each participle, obtain a corrected standard word matched with the participle for each specified field of the second type of field based on the number of the standard words matched with the participle and a determination result of whether the standard word matched with the participle has a replacement related word;

Optionally, in a specific implementation manner, the standard word modification module is specifically configured to:

if the number of the standard words matched with the participle is one and the standard words matched with the participle do not have the replacement relation words, determining the standard words matched with the participle as the modified standard words matched with the participle;

if the number of the standard words matched with the participle is one and the standard words matched with the participle have replacement relation words, determining the replacement relation words of the standard words matched with the participle as modified standard words matched with the participle;

if the number of the standard words matched with the participle is multiple and a target standard word exists, deleting the target standard word and judging whether each standard word which is matched with the participle currently exists has a replacement relation word; wherein the target standard words are: the relation-containing word of at least one standard word in the plurality of standard words matched with the participle; if yes, determining the replacement relation word of the standard word matched with the participle as the corrected standard word matched with the participle; if the standard word does not exist, determining the standard word matched with the participle as the corrected standard word matched with the participle;

and if the number of the standard words matched with the participle is multiple and the target standard word does not exist, executing the step of judging whether the existing standard words matched with the participle exist or not.

Corresponding to the medical text data mapping method provided by the embodiment of the invention, the embodiment of the invention also provides a medical text data mapping device.

Fig. 7 is a schematic structural diagram of a medical text data mapping apparatus according to an embodiment of the present invention, and as shown in fig. 7, the apparatus may include the following modules:

a medical text data obtaining module 710, configured to obtain a plurality of word segments of medical text data; wherein the medical text data comprises data content under a plurality of specified fields, the plurality of specified fields being: presetting fields of a standardized database;

a medical text mapping establishing module 720, configured to process the multiple participles of the medical text data according to any one of the data mapping methods provided in the first aspect described above, so as to establish a data mapping relationship regarding the multiple participles.

Based on the above, in the medical field, by applying the medical text data mapping scheme provided by the embodiment of the invention, standardized data mapping can be performed on the word segmentation result of the medical text data. Therefore, an implementation basis can be provided for the standardized processing of the medical text data of different information systems.

Corresponding to the above method embodiment, an electronic device according to an embodiment of the present invention is further provided, as shown in fig. 8, and includes a processor 801, a communication interface 802, a memory 803, and a communication bus 804, where the processor 801, the communication interface 802, and the memory 803 complete mutual communication via the communication bus 804,

a memory 803 for storing a computer program;

the processor 801 is configured to implement the steps of any of the data mapping methods provided in the embodiments of the present invention described above and/or the steps of any of the medical text data mapping methods when executing the program stored in the memory 803.

The communication bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.

The communication interface is used for communication between the electronic equipment and other equipment.

The Memory may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.

The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components.

In a further embodiment of the present invention, a computer-readable storage medium is further provided, in which a computer program is stored, which when executed by a processor implements the steps of any of the data mapping methods provided in the above-mentioned embodiments of the present invention, and/or the steps of any of the medical text data mapping methods.

In a further embodiment provided by the present invention, there is also provided a computer program product containing instructions which, when run on a computer, cause the computer to perform the steps of any of the data mapping methods provided by the embodiments of the present invention described above, and/or the steps of any of the medical text data mapping methods.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, apparatus embodiments, electronic device embodiments, computer-readable storage medium embodiments, and computer program product embodiments are described for simplicity because they are substantially similar to method embodiments, as may be found in some descriptions of method embodiments.

The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims

1. A method of data mapping, the method comprising:

aiming at each participle, establishing a data mapping relation related to the participle according to a standard word matched with the participle and a specified field to which the participle belongs;

wherein the plurality of specified fields include: the method includes the following steps that a first type field can be classified according to the membership between standard words in a standardized database of the first type field, and before establishing a data mapping relation of each participle according to the standard word matched with the participle and a specified field to which the participle belongs, the method further includes:

determining the belonging grade of a standard word matched with each participle as the participle of the first type field aiming at each belonging specified field, and taking the grade as the first mapping content of the participle, and determining the standard word under the target grade corresponding to the standard word matched with the participle as the second mapping content of the participle when the belonging grade of the standard word matched with the participle is not the highest grade; wherein, the target grade is higher than the grade of the standard word matched with the participle;

correspondingly, for each participle, establishing a data mapping relation related to the participle according to the standard word matched with the participle and the specified field to which the participle belongs, wherein the data mapping relation comprises the following steps:

aiming at each participle, if the specified field to which the participle belongs is the first-class field and has a first mapping content and a second mapping content, establishing a data mapping relation related to the participle according to a standard word matched with the participle, the specified field to which the participle belongs, the first mapping content of the participle and the second mapping content of the participle; if the designated field to which the participle belongs is the first type field and only has first mapping content, establishing a data mapping relation related to the participle according to the standard word matched with the participle, the designated field to which the participle belongs and the first mapping content of the participle; otherwise, establishing a data mapping relation related to the participle according to the standard word matched with the participle and the specified field to which the participle belongs.

2. The method of claim 1, wherein the step of determining a standard word matching the segmented word based on the standardized database of the plurality of specified fields comprises:

traversing the standardized databases of the designated fields, and matching the participle with each reference word corresponding to each standardized database when each standardized database is traversed; wherein, each reference word corresponding to the standardized database comprises: each standard word in the standardized database and/or each similar meaning word associated with each standard word in the standardized database;

if the reference word matched with the participle exists, determining the standard word corresponding to the reference word in the standardized database as the standard word matched with the participle;

if there is no reference word matching the segmented word, then the next normalized database is traversed.

3. The method of claim 2, wherein the step of matching the segmented word with the respective reference word corresponding to each standardized database while traversing to each standardized database comprises:

when traversing to the standardized database of the first-class field, determining each reference word with the lowest belonging grade from each reference word which is not matched with the participle and corresponds to the standardized database, and matching the participle with each determined reference word;

when traversing to a standardized database of the designated fields except the first-class fields, matching the participle with each reference word corresponding to the standardized database;

the method further comprises the following steps:

and when traversing the standardized database of the first type of field, if the reference word matched with the participle does not exist, returning to the step of determining each reference word with the lowest belonging grade from each reference word which is not matched with the participle and corresponds to the standardized database before the step of traversing the next standardized database.

4. The method according to claim 1 or 3, wherein the step of establishing the data mapping relation for the participle according to the standard word matched with the participle, the specified field to which the participle belongs, the first mapping content of the participle and the second mapping content of the participle comprises:

establishing a first mapping relation between a standard word matched with the participle, a designated field to which the participle belongs and first mapping content of the participle; establishing a second mapping relation between each standard word in the second mapping content of the participle and the designated field to which the standard word belongs and the target level to which the standard word belongs; determining the established first mapping relation and the second mapping relation as a data mapping relation related to the participle;

the step of establishing a data mapping relation related to the participle according to the standard word matched with the participle, the designated field to which the participle belongs and the first mapping content of the participle comprises the following steps:

and establishing a mapping relation between the standard word matched with the word segmentation and the designated field to which the word segmentation belongs and the first mapping content of the word segmentation, and determining the established mapping relation as a data mapping relation related to the word segmentation.

5. The method of claim 1 or 2, wherein the plurality of specified fields comprises: a second type field, and a standard word with at least one specific relation word exists in each standard word in the standardized database of the second type field; the specific relation words comprise: including relation words and replacement relation words;

before the step of establishing, for each participle, a data mapping relation with respect to the participle according to the standard word matched with the participle and the specified field to which the participle belongs, the method further includes:

aiming at each participle of which the designated field belongs to is the second type field, obtaining a corrected standard word matched with the participle based on the number of the standard words matched with the participle and the determination result of whether the standard word matched with the participle has a replacement related word or not;

6. The method according to claim 5, wherein the step of obtaining the modified standard word matching the segmented word based on the number of the standard words matching the segmented word and the determination result of whether the standard word matching the segmented word has the replacement related word comprises:

7. An apparatus for data mapping, the apparatus comprising:

the word segmentation acquisition module is used for acquiring a plurality of words of the target text data; the target text data comprises data contents under a plurality of specified fields, wherein the specified fields are as follows: presetting fields of a standardized database; the word segmentation determining module is used for determining a standard word matched with the word segmentation and determining the designated field to which the word segmentation belongs according to each word segmentation and the standardized database of the designated fields; wherein, the appointed fields of the participle are as follows: the designated field to which the standard word matched with the participle belongs; the relation establishing module is used for establishing a data mapping relation related to the participle according to the standard word matched with the participle and the designated field to which the participle belongs;

wherein the plurality of specified fields include: the first type field can be graded according to the subordination relation among all standard words in the standardized database of the first type field; the apparatus further comprises:

correspondingly, the relationship establishing module is specifically configured to: aiming at each participle, if the specified field to which the participle belongs is the first-class field and has a first mapping content and a second mapping content, establishing a data mapping relation related to the participle according to a standard word matched with the participle, the specified field to which the participle belongs, the first mapping content of the participle and the second mapping content of the participle; if the designated field to which the participle belongs is the first type field and only has first mapping content, establishing a data mapping relation related to the participle according to the standard word matched with the participle, the designated field to which the participle belongs and the first mapping content of the participle; otherwise, establishing a data mapping relation related to the participle according to the standard word matched with the participle and the specified field to which the participle belongs.

8. A medical text data mapping method, characterized in that the method comprises:

the method steps according to any of claims 1-6 processing a plurality of the segmentations of the medical text data to establish a data mapping relation with respect to a plurality of the segmentations.

9. A medical text data mapping apparatus, characterized in that the apparatus comprises:

the medical text data acquisition module is used for acquiring a plurality of word segments of the medical text data; wherein the medical text data comprises data content under a plurality of specified fields, the plurality of specified fields being: presetting fields of a standardized database; a medical text mapping establishing module for processing a plurality of participles of said medical text data according to the method steps of any of claims 1-6 to establish a data mapping relation with respect to a plurality of said participles.

10. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus;

a memory for storing a computer program;

a processor for performing the method steps of any one of claims 1 to 6 and/or the method steps of claim 8 when executing a program stored in the memory.

11. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method steps of any one of the claims 1-6 and/or the method steps of claim 8.