CN115438379A

CN115438379A - Electronic medical record data desensitization method and system based on FLAT

Info

Publication number: CN115438379A
Application number: CN202211116144.4A
Authority: CN
Inventors: 桑波; 王文谦; 靳恩朝; 张述睿; 王建坤
Original assignee: Shandong Msunhealth Technology Group Co Ltd
Current assignee: Shandong Msunhealth Technology Group Co Ltd
Priority date: 2022-09-14
Filing date: 2022-09-14
Publication date: 2022-12-06

Abstract

The invention provides an electronic medical record data desensitization method and system based on FLAT, which relate to the technical field of data desensitization, and are used for collecting electronic medical record text data, and performing data generalization and knowledge embedding processing on the text data to obtain a character fragment sequence sample set; training an entity recognition model constructed based on FLAT and CRF by using a text fragment sequence sample set; inputting an electronic medical record text to be desensitized into a trained entity recognition model to obtain a sensitive entity and an entity type of the electronic medical record; carrying out specific desensitization treatment on the sensitive entity according to the entity type; the entity identification scheme mainly based on the FLAT-CRF model is adopted, data enhancement is carried out on marked entities in a generalization mode of random substitution of similar entities, representations of the entities are added into word vectors and word vectors of the marked entities simultaneously to carry out information embedding, classification desensitization treatment is carried out on the identified entities, and the accuracy rate and reasoning speed of data desensitization are improved.

Description

Electronic medical record data desensitization method and system based on FLAT

Technical Field

The invention belongs to the technical field of data desensitization, and particularly relates to an electronic medical record data desensitization method and system based on FLAT.

Background

With the popularization of medical electronic informatization, electronic medical records have become a necessary way for various hospitals to record medical information. The method has important significance for promoting medical service intellectualization, improving medical service quality and reducing treatment response time by aiming at analysis of related data in the electronic medical record. Because of the privacy requirements of the patient, the patient's relevant information must be desensitized prior to application, such as name, date, address, institution name, contact details, various important numbers, and the like.

Due to the characteristics of the data structure, the source locality and the like of the electronic medical record, the data desensitization task is very challenging.

(1) In an electronic medical record, the content of a structured text is less, the structure of the text is in random change, manual rules are often adopted for discrimination, and besides, extraction of sensitive information in a large amount of unstructured texts is a main difficult problem.

The extraction of sensitive information of unstructured text can be classified into the technical field of named entity recognition. The accuracy and reasoning speed of named entity recognition techniques are major aspects in the application. With the rapid development of a neural network, an LSTM-CRF model, a GRU-CRF model and an IDCNN-CRF model based on a static word vector technology gradually become a mainstream framework for named entity identification; with the development of the dynamic word vector technology, a pre-training model based on a Transfromer framework begins to become a mainstream, and through simple parameter adjustment, higher accuracy than that of the previous network model can be obtained, but meanwhile, due to the huge parameter quantity, the reasoning speed is slower.

(2) The sources of data are often concentrated in specific regions, for example, most of the data come from a certain province, so the place name and the organization name in the collected corpus have strong region characteristics.

(3) Data were collected in the last 2, 3 years, so only the last 2, 3 years appeared in time.

(4) Surnames in the data may appear very different in frequency due to the difference in population.

In addition, the address expression in the electronic medical record often adopts the spoken forms such as shorthand, abbreviation, etc., for example: the chen table city of hebei province, ningjin county, is often written as: hechen taining jin; similarly, dates, addresses, organizational structures, etc. may be used to similarly taste.

Therefore, the existing electronic medical record data desensitization scheme has the problems of large data sample limitation, difficult identification of spoken entities, low accuracy and slow model reasoning speed, and needs to be further researched.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention provides an electronic medical record data desensitization method and system based on FLAT, an entity identification scheme taking an FLAT-CRF model as a main part is adopted, data enhancement is carried out on labeled entities in a generalization mode of randomly replacing similar entities, the expression of the entities is added into word vectors and word vectors of the labeled entities simultaneously to carry out information embedding, desensitization processing is carried out on the identified entities in a mode of replacing special characters, and the accuracy rate and reasoning speed of data desensitization are improved.

In order to realize the purpose, the invention provides the following technical scheme:

collecting electronic medical record text data, and carrying out data generalization and knowledge embedding processing on the text data to obtain a character fragment sequence sample set;

training an entity recognition model constructed based on FLAT and CRF by using a character fragment sequence sample set;

inputting the electronic medical record text to be desensitized into the trained entity recognition model for reasoning to obtain the sensitive entities and entity types of the electronic medical record;

specific desensitization treatments are performed on sensitive entities according to entity type.

Furthermore, the text segments are general names of characters and words.

Further, the specific steps of obtaining the sample set are as follows:

according to the special characters, punctuations and the set maximum length of the sentence, the text is divided into sentences;

manually marking entities and entity types in sentences, and carrying out data generalization processing on the marked entities according to the entity types;

constructing word vectors and word vectors which are added with knowledge embedding representation of surnames and addresses;

and segmenting the intercepted sentences to obtain a character fragment sequence of each sentence, wherein the character fragments and the position information of the character fragments form a Flat-lattice data structure unit required by the model.

Performing character vectorization and word vectorization on characters and words in the character fragment sequence to obtain a character fragment sequence matrix of each sentence;

constructing a relative position coding matrix of the text segment sequence matrix;

further, the data generalization includes surname generalization, address, organization name generalization and date generalization of names.

Further, the building of the word vector and the word vector to which the knowledge embedding representation of the surnames and the addresses is added specifically includes:

constructing a word vector and a word vector according to the word vector dictionary and the word vector dictionary of the social discipline class;

and adding knowledge embedding representation of surnames and addresses in the constructed word vector and the word vector.

Further, the relative position coding matrix is composed of relative position codes of two character segments in the character segment sequence matrix, and the calculation method of the relative position codes is as follows:

simulating the relative position relation between two different character segments by using dense vectors to obtain four distances between a head, the head and the tail, and the tail;

and splicing the four distances, and then carrying out nonlinear transformation to obtain the relative position code of the character segment sequence.

Further, the entity identification model comprises a multi-head self-attention layer, a feed-forward network layer and a CRF layer, and the specific steps are as follows:

in the multi-head self-attention layer, multi-attention position coding is carried out on the character segment sequence matrix and the corresponding relative position coding matrix; based on the position coding, performing multi-head self-attention mechanism calculation on the character segment matrix;

performing residual error connection and normalization on a feed-forward network layer to obtain character segment coding representation;

and calculating the highest score of the text segment on a CRF layer to obtain the entity label.

The invention provides an electronic medical record data desensitization system based on FLAT in a second aspect.

An electronic medical record data desensitization system based on FLAT comprises a sample set construction module, a model training module, an entity identification module and a desensitization processing module;

a sample set construction module configured to: collecting electronic medical record text data, and performing data generalization and knowledge embedding processing on the text data to obtain a character fragment sequence sample set;

a model training module configured to: training an entity recognition model constructed based on FLAT and CRF by using a character fragment sequence sample set;

an entity identification module configured to: inputting an electronic medical record text to be desensitized into a trained entity recognition model to obtain a sensitive entity and an entity type of the electronic medical record;

a desensitization processing module configured to: specific desensitization treatments are performed on sensitive entities according to entity type.

A third aspect of the present invention provides a computer readable storage medium, on which a program is stored, which program, when being executed by a processor, carries out the steps of a method for desensitizing electronic medical record data based on FLAT according to the first aspect of the present invention.

A fourth aspect of the present invention provides an electronic device, which includes a memory, a processor, and a program stored in the memory and executable on the processor, wherein the processor executes the program to implement the steps of the method for desensitizing electronic medical record data based on FLAT according to the first aspect of the present invention.

The above one or more technical solutions have the following beneficial effects:

the invention adopts an entity identification scheme taking a FLAT-CRF model as a main part, and utilizes the FLAT technology to fuse static word vectors and word vectors, thereby obtaining higher accuracy than that of the traditional main flow model, and simultaneously, a pre-training model is not applied for feature extraction, thereby ensuring better reasoning speed.

In order to meet the requirement of the model on the feature recognition of the spoken entity, the invention adds knowledge embedding representation of surnames and addresses in the word vector and the word vector, so that the model automatically learns the spoken representation structure knowledge of the entity in the corpus, and the accuracy of the model is improved.

Aiming at the limitations of the region and time of the corpus source, the data is enhanced by utilizing the information of surnames, addresses, dates and the like and adopting a generalization mode of similar entity random replacement, so that the limitation of data samples is reduced, the overfitting of the model is greatly slowed down, and the usable range of the model is ensured.

Advantages of additional aspects of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, are included to provide a further understanding of the invention, and are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the invention and together with the description serve to explain the invention and not to limit the invention.

FIG. 1 is a flow chart of the method of the first embodiment.

Fig. 2 is a data structure diagram of the Flat-Lattice in the first embodiment.

Fig. 3 is a structural diagram of an encoder in the first embodiment.

Fig. 4 is a system configuration diagram of the second embodiment.

Detailed Description

The invention is further described with reference to the following figures and examples.

It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the invention. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the invention. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.

Example one

The embodiment discloses an electronic medical record data desensitization method based on FLAT;

as shown in fig. 1, a method for desensitizing electronic medical record data based on FLAT includes:

s1, collecting electronic medical record text data, and performing data generalization and knowledge embedding processing on the text data to obtain a character fragment sequence sample set;

the character fragment sequence sample set consists of a character fragment sequence matrix and a relative position coding matrix, and the specific steps of obtaining the character fragment sequence sample set are as follows:

s1-1, dividing a text into sentences according to special characters, punctuations and set maximum length of the sentence;

sentence division is carried out on the collected electronic medical record text data according to special characters (n, r, and the like); aiming at the structured data with complexity, variability and small quantity in the electronic medical record, the structured data is uniformly treated as unstructured data, and the structural symbols (\\ n, \ r, spaces and the like) are treated as specific characters.

Carrying out sentence cutting processing on the sentences according to the set maximum sentence length; if the marked entity appears in the sentence length larger than the maximum set sentence length and the entity mark exists outside the maximum set sentence length, the sentence is cut at the punctuation mark position within the maximum sentence length, and the specific rule is as follows:

rule 1: when the number of words in a sentence is more than 90, a punctuation mark (",", ",") is encountered, the truncation is carried out at the punctuation mark, a new sentence is formed before the truncation, and the rest sentences are continued to be regularly truncated.

Rule 2: when the number of words in the sentence is more than 120, the punctuation symbol (",") is encountered, the truncation is carried out at the punctuation symbol, so that a new sentence is formed before the truncation, and the rest sentences continue to be subjected to the regular truncation.

Rule 3: when the number of words in the sentence is more than 150 and 150 is not the target entity, the truncation is directly carried out at the position; if 150 is part of the target entity, a clause is made at the closest punctuation preceding the target entity.

For example, in the example sentence "(2), the patient finds out liver space occupation by ultrasonic examination in 2019-07-19, takes the liver cancer into consideration by performing intensive CT, performs liver puncture in tumor hospital in shandong province in 2019-07-24, pathologically shows intrahepatic bile duct cancer (the pathology number is 2019-511880), and gives 'hepatic artery chemoembolization' in the hospital interventional department in 2019-07-29 and 2019-09-11 respectively, and the specific medicines are as follows: 150mg of oxaliplatin, 1.4g of gemcitabine, 12ml of iodized oil and 15ml of iodized oil are adopted, ascites is formed before 2 months, abdominal catheterization ascites drainage treatment is carried out in Shandong province tumor hospitals before 1 month, and ascites is drained automatically at home. If the comma before "medication is specified" meets rule 1, a sentence is divided at the comma.

S1-2, manually marking entities and entity types in sentences, and carrying out data generalization processing on marked entities according to the entity types;

after the entity marking is finished, according to the entity type, namely surname, address, organization name and date, the marked entity is subjected to data generalization treatment, so as to achieve the purpose of data enhancement, comprising the following steps:

surname generalization of names

1) Preparing a relatively complete surname dictionary, and extracting single characters of the dictionary to form a surname dictionary; words in the surname dictionary have a length of 1, but words in the surname dictionary have a length of 1.

2) Aiming at the name in the label sentence, extracting the surname word by a code, and randomly selecting surname from a surname dictionary for replacement; for example, after "li" is extracted from the sentence, "li" is extracted, generalized words such as "wang", "zhang" and "euyang" are randomly selected from the family name dictionary, and combined into "wang qiang", "zhang" and "euyang qiang".

Generalization of address and organization name

1) Preparing a relatively complete address dictionary; according to the administrative division of the people's republic of China, an address dictionary generally comprises five levels of units: province level (province, city, municipality), district level (city), county level (county), county level (village, town), village level (village committee, residence committee), cleaning for non-address class representation therein, such as: * Township member, remove committee; * Cell affiliation, removal of affiliation, etc.; here, only dictionary expressions at the provincial level, the prefecture level, and the prefecture level are selected.

2) Aiming at provincial, regional and county dictionaries, extracting the single words in the provincial, regional and county dictionaries to form corresponding provincial, regional and county dictionaries; the word length of the provincial level dictionary, the regional level dictionary and the county level dictionary is 1.

3) Aiming at the address and the mechanism name in the annotation sentence, extracting the unit words at each level through the code to represent, and then randomly selecting the address from the corresponding unit dictionary at the level in the address dictionary to replace; for example, the sentence "zhao county of shi jiazhuang city, north Hei" can be generalized to "Jinan city, cao county, of Shandong province" and "Hibiscus area, changsha city, of Hunan province"; "Neze women and children" can be generalized as "Zhangye women and children" and "Cao county women and children".

Generalization of date

Extracting the year of the date in the marked sentence through codes, and then randomly replacing the date with the year of the last 5 years; keep the format of the date unchanged, such as: chinese font, digital font. For example, "6 months in 2021", in the sentence, can be generalized to "6 months in 2022", "6 months in 2025", "6 months in 2026"; "2021.07" in the sentence can be generalized as "2023.07", "2025.07", etc.

S1-3, constructing a word vector and a word vector which are added with the knowledge embedding representation of the surname and the address, and comprising the following steps:

1) Preparing a word vector dictionary and a word vector dictionary of the social discipline class; because the entity to be identified belongs to the category of social disciplines, a more accurate representation can be obtained using the vector dictionary of social disciplines. The word vector and the word vector adopted in the present embodiment are both 50 dimensions, where the word vector is denoted by c and the word vector is denoted by w.

Where k represents the position of occurrence of a specific word in a sentence.

Where k represents the occurrence position of a particular word in a sentence.

2) The word vector after adding knowledge embedding representation of surname and address (province level, prefecture level and county level) is represented as:

wherein the content of the first and second substances,

is 0/1, representing whether word k appears in the surname dictionary;

is 0/1, representing whether word k appears in the provincial address dictionary;

is 0/1, representing whether word k appears in the level address dictionary;

is 0/1, representing whether word k appears in the county level address dictionary.

3) The word vector after adding knowledge embedding representation of surname and address (province level, prefecture level and county level) is represented as:

wherein the content of the first and second substances,

is 0/1, whether the representative word (word) k appears in the surname dictionary or not;

is 0/1, represents whether the word k appears in the provincial address dictionary;

is 0/1, representing whether the word k appears in the level address dictionary;

is 0/1, represents whether the word k appears in the county-level address dictionary.

S1-4, segmenting words of the intercepted sentences, and vectorially representing the distribution of characters and words in the sentences to obtain a character fragment sequence matrix of the sentences;

1) And segmenting the intercepted sentence, and splicing the sentence into characters and words to obtain a character fragment expression of the sentence.

X _u ＝{c ₁ ,c ₂ ,…c _i ,…c _n ,w ₁ ,w ₂ ,…w _j ,…w _m }

Wherein, c _i Is a vector representation of the ith word in a sentence, w _j Is a vector representation of the jth word in a sentence, X _u Representing the u-th sentence in the corpus.

Note that, here, characters and words are collectively described and represented by text segments, and sentences can be directly described by text segments, and the formula is as follows:

X _u ＝{x ₁ ,x ₂ ,…x _k ,…x _n+m }

wherein x is _k The kth text fragment representing the u-th sentence in the corpus.

2) Text fragment sequence X of sentence _u Can be developed into a flat-lattice data structure, as shown in fig. 2, the flat-lattice data is a collection of sequence segments, and the segments are composed of a token, a head and a tail; the token represents a word or a word of text, and the head and the tail represent the positions of the first word and the last word in the token in the original sequence.

Finally, each sentence can be vectorized and expressed into a character segment sequence matrix E _X ∈d _model ，d _model Is the dimension of the word vector and the word vector in step S1-4, here 54.

S1-5, constructing a relative position coding matrix of a character segment sequence matrix;

the relative position coding matrix is composed of relative position codes of every two character segments in the character segment sequence matrix, and the calculation method of the relative position codes comprises the following steps:

1) In lattice structure, two different text segments (words/words) x _i And x _j There are three kinds of correlation relationships between them: intersection, inclusion, and phase separation, these relationships are modeled using dense vectors:

wherein, head [ i ]]、tail[i]Respectively represent x _i The head position and the tail position of the body,

denotes x _i Head and x _j Of the head, others

And the like.

2) After splicing the four distances, carrying out nonlinear transformation to obtain the relative position code of the character fragment sequence, wherein the specific formula is as follows:

wherein, W _r It is the parameter that can be learned that,

indicating a splicing operation, P _d The calculation formula of (2) is as follows:

here, d represents

k is the vector dimension of the text segment and has a value range of [0-d _model /2]。

S2, training an entity recognition model constructed based on FLAT and CRF by using a sentence sample set;

the entity identification model is formed by stacking a plurality of encoders, the structure of each encoder is shown in fig. 3 and mainly comprises a multi-head self-attention layer (multi-head self-attention), a Feed-forward network layer (Feed-forward network), a Residual connection (Residual connection) and a layer normalization (layer normalization) which are arranged in a penetrating way; in order to ensure the reasoning efficiency, only a 1-layer Transformer structure is adopted as an encoder.

The specific steps of each layer of encoder are as follows:

1) In a multi-head self-attention layer, utilizing head and tail information carried by sequence segments in a flat-lattice data structure to carry out relative position coding; based on the position coding, performing multi-head self-attention mechanism calculation by using the sequence vector;

inputting the training sentences with fixed batch sizes into an encoder, wherein the formula is as follows:

wherein the content of the first and second substances,

W _k,E 、W _k,R are all parameters that can be learned in a trainable way,

is a matrix representation of the jth sentence in the training sentence,

also for the same reason, R _ij Is the relative position coding of sentence i and sentence j in step (6),

is a variant of the self-attention module, approximately equivalent to the following representation:

will be provided with

Instead of A in the following formula, the attention mechanism calculation for the batch statement can be derived:

Att(A,V)＝softmax(A)*V

in this model Q _i ,K _i ,V _i Representing inputs, respectively query vector, key vector, value vector, containing

And

it is shown that self-attention is performed by the module.

And splicing different Attention results obtained by the multi-head Attention mechanism to obtain a final output sequence vector.

2) In a feed-forward network layer, residual error connection and normalization are carried out on the output of the multi-head self-attention layer to obtain the coded representation of the character segments;

3) And calculating the highest score of the text segment on a CRF layer to obtain the entity label.

S3, inputting an electronic medical record text to be desensitized into the trained entity recognition model to obtain a sensitive entity and an entity type of the electronic medical record;

and S4, carrying out specific desensitization treatment on the sensitive entity according to the entity type. The method for replacing the special characters is specifically as follows: replacing the name of the person with "# person _ name #"; the specific date is replaced with "# date #"; replacing the address with "# location #"; replacing the organization name with "# organization #"; the contact means is replaced with "# telephone #"; replacing various important numbers with "# ID #"; where "#" is a special character set for convenience in extracting an entity.

The desensitization selection data of the embodiment come from a plurality of comprehensive hospitals, and the labeling data is selected in a keyword search mode. The selected data emphasizes the content diversity of the labeled data; the number of each type of entity is determined according to the richness of the entity content. The number of entities in the selected sample is shown in table 1:

TABLE 1 entity quantity chart of selected samples

The data obtained after generalization are shown in table 2:

TABLE 2 Experimental data constitution

Number of training set sentences	118998
		Number of sentences in verification set	14927
Test set sentence number	14878

The effect of several models on data desensitization was experimentally compared, and specific data are shown in table 3:

TABLE 3 comparison of the accuracy of the different protocols

As can be seen from Table 1, the FLAT model has an accuracy improved by 3% compared with the BILSTM model; after the FLAT is selected as the baseline model, knowledge embedding is added according to the word vectors and the word vectors, and the accuracy rate is improved by 1%.

After data generalization and knowledge embedding are used, the accuracy of BILSTM + CRF is improved by about 4%, which shows the effectiveness of data generalization and knowledge embedding.

Example two

The embodiment discloses an electronic medical record data desensitization system based on FLAT;

as shown in fig. 3, an electronic medical record data desensitization system based on FLAT includes a sample set constructing module, a model training module, an entity identifying module and a desensitization processing module;

a sample set construction module configured to: collecting electronic medical record text data, and carrying out data generalization and knowledge embedding processing on the text data to obtain a character fragment sequence sample set;

a model training module configured to: training an entity recognition model constructed based on FLAT and CRF by using a text fragment sequence sample set;

EXAMPLE III

An object of the present embodiments is to provide a computer-readable storage medium.

A computer-readable storage medium, on which a computer program is stored, which when executed by a processor, implements the steps in a method for desensitizing electronic medical record data based on FLAT according to embodiment 1 of the present disclosure.

Example four

An object of the present embodiment is to provide an electronic apparatus.

Electronic equipment, comprising a memory, a processor and a program stored in the memory and executable on the processor, wherein the processor executes the program to implement the steps of the method for desensitizing electronic medical record data based on FLAT according to embodiment 1 of the present disclosure.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A method for desensitizing electronic medical record data based on FLAT is characterized by comprising the following steps:

inputting an electronic medical record text to be desensitized into a trained entity recognition model for reasoning to obtain a sensitive entity and an entity type of the electronic medical record;

2. The method as claimed in claim 1, wherein the text segment is a general name of a word or a phrase.

3. The method for desensitizing electronic medical record data based on FLAT as set forth in claim 1, wherein the specific steps for obtaining the sample set are:

constructing a word vector and a word vector which are added with knowledge embedding representation of surnames and addresses;

and constructing a relative position coding matrix of the text segment sequence matrix.

4. The method as set forth in claim 3, wherein the data generalization includes surname generalization of names, address generalization of addresses, organization name generalization, and date generalization of names.

5. The method as claimed in claim 3, wherein the method for desensitizing electronic medical record data based on FLAT is characterized in that the method for constructing word vectors and word vectors to which knowledge embedding expressions of surnames and addresses are added specifically comprises:

6. The method as claimed in claim 3, wherein the relative position coding matrix is composed of relative position codes of two character segments in a character segment sequence matrix, and the method for calculating the relative position codes comprises:

7. The method for desensitizing electronic medical record data based on FLAT according to claim 1, wherein the entity identification model comprises a multi-head self-attention layer, a feed-forward network layer and a CRF layer, and the method comprises the following steps:

in the multi-head self-attention layer, multi-attention position coding is carried out on the character segment sequence matrix and the corresponding relative position coding matrix; based on the position coding, performing multi-head self-attention mechanism calculation on the character fragment matrix;

performing residual connection and normalization on a feed-forward network layer to obtain character segment coding representation;

8. An electronic medical record data desensitization system based on FLAT is characterized by comprising a sample set construction module, a model training module, an entity identification module and a desensitization processing module;

an entity identification module configured to: inputting the electronic medical record text to be desensitized into the trained entity recognition model for reasoning to obtain the sensitive entities and entity types of the electronic medical record;

a desensitization processing module configured to: specific desensitization treatment is performed on sensitive entities according to entity types.

9. A computer-readable storage medium, on which a program is stored, which, when being executed by a processor, carries out the steps of a method for electrical medical record data desensitization based on FLAT according to any of claims 1-7.

10. Electronic device comprising a memory, a processor and a program stored on the memory and executable on the processor, wherein the processor executes the program to perform the steps of a method for FLAT-based electronic medical record data desensitization according to any of claims 1-7.