CN116341556A

CN116341556A - Small sample rehabilitation medical named entity identification method and device based on data enhancement

Info

Publication number: CN116341556A
Application number: CN202310612923.1A
Authority: CN
Inventors: 陈博; 孟过; 刘炯; 王剑斌; 沈怡俊
Original assignee: Zhejiang University of Technology ZJUT
Current assignee: Zhejiang University of Technology ZJUT
Priority date: 2023-05-29
Filing date: 2023-05-29
Publication date: 2023-06-27

Abstract

The invention discloses a method and a device for identifying a small sample rehabilitation medical named entity based on data enhancement, wherein the method comprises the following steps: acquiring initial rehabilitation medical case data, dividing named entities, and performing BIOS labeling on the rehabilitation medical case data divided by the named entities; performing data enhancement on the rehabilitation medical case data divided by the named entity to obtain rehabilitation medical case data with new labels; comprising the following steps: analyzing the length of each named entity in the rehabilitation medical case data divided by the named entity, and carrying out random mask on different named entities in the rehabilitation medical case data; and/or randomly replacing named entities in the rehabilitation medical case data among named entity types of the same type; and inputting the initial rehabilitation medical case data and the rehabilitation medical case data with the new label into a named entity recognition network to obtain a rehabilitation medical named entity recognition result.

Description

Small sample rehabilitation medical named entity identification method and device based on data enhancement

Technical Field

The invention relates to the technical fields of data enhancement, named entity recognition, BIOS labeling and the like, in particular to a method and a device for recognizing a small sample rehabilitation medical named entity based on data enhancement.

Background

In the modern society with increasingly developed medicine, many diseases still have life safety seriously threatening human beings, wherein, the stroke has become the first death cause in China due to the characteristics of high morbidity, high disability rate, high death rate and high recurrence rate, and is also the primary cause of disability of adults in China. Therefore, recovery of limb movement functions of a patient suffering from cerebral apoplexy is an important link for rehabilitation of the patient. With the rapid development of artificial intelligence, technologies for assisting rehabilitation diagnosis, planning or assisting treatment process by a deep learning method are emerging. However, the training process of the depth model often requires a large amount of calibration data, while the real acquired data is usually structured, semi-structured, or unstructured data, which restricts the training process of the depth model in terms of data structure and data quality. Structured data generally refers to data that can be logically implemented in two-dimensional tables; the semi-structured data does not conform to the form of a two-dimensional table, but contains associated markers; unstructured data does not have fixed structured data, such as case text.

In practical application, compared with the other two types, the structured data has the advantages of rare quantity and higher acquisition cost, and the problems are particularly serious in the professional fields such as the rehabilitation medicine field. By establishing a named entity recognition network, structured information such as entities, relations, entity attributes and the like can be automatically extracted from semi-structured and unstructured data, so that the problems that the structured data is small in data size and difficult to acquire in actual situations can be effectively solved. In the above process, entity extraction is one of the key technologies. Entity extraction is also called named entity recognition, and positions and classifies important nouns and proper nouns in text, which can be called named entities, and the named entities can be artificially set according to different downstream tasks.

Named entity recognition is the basis for many downstream tasks, and typically the accuracy and effect of named entity recognition determines the effect of different downstream tasks. There are many deep learning network frameworks for named entity recognition, however, there are significant shortcomings to the training process of these deep networks: a) In the training process of the deep neural network model, a large number of effective label data fitting models in the medical field are needed; b) In practical situations, it is difficult to train a named entity recognition network from scratch, considering the data and computational demands of training the named entity recognition neural network. Particularly, when a knowledge graph is constructed in the professional fields of rehabilitation and medical treatment and the like, the acquisition of professional medical label data is difficult or the acquisition cost is high, and the general deep learning network framework is difficult to train to assist in the task of extracting the named entities of unstructured data.

Therefore, a small sample rehabilitation medical named entity recognition method based on data enhancement is provided, and the method is applied to recognition of data in the field of rehabilitation medical treatment.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a method and a device for identifying a small sample rehabilitation medical named entity based on data enhancement.

According to a first aspect of an embodiment of the present invention, there is provided a method for identifying a small sample rehabilitation medical named entity based on data enhancement, the method comprising:

acquiring initial rehabilitation medical case data, dividing named entities, and performing BIOS labeling on the rehabilitation medical case data divided by the named entities;

performing data enhancement on the rehabilitation medical case data divided by the named entity to obtain rehabilitation medical case data with new labels; comprising the following steps:

analyzing the length of each named entity in the rehabilitation medical case data divided by the named entity, and carrying out random mask on different named entities in the rehabilitation medical case data;

and/or the number of the groups of groups,

randomly replacing named entities in the rehabilitation medical case data among named entity types of the same type;

and inputting the initial rehabilitation medical case data and the rehabilitation medical case data with the new label into a named entity recognition network to obtain a rehabilitation medical named entity recognition result.

According to a second aspect of the embodiment of the present invention, a small sample rehabilitation medical named entity identification device based on data enhancement is provided, which includes one or more processors, and is configured to implement the small sample rehabilitation medical named entity identification method based on data enhancement.

According to a third aspect of embodiments of the present invention, there is provided a computer readable storage medium having stored thereon a program which, when executed by a processor, is configured to implement the above-described method for identifying a small sample rehabilitation medical named entity based on data enhancement.

Compared with the prior art, the invention has the beneficial effects that: the invention provides a small sample rehabilitation medical named entity identification method based on data enhancement, which is used for generating additional effective label rehabilitation medical case data by a random mask and/or a random replacement data enhancement mode and supplementing data under the condition of lacking enough effective label rehabilitation medical case data; the enhanced rehabilitation medical case data are input into a pre-training model for named entity recognition, and a named entity recognition network model is adapted to the rehabilitation medical field in the example through a fine tuning means, so that medical information in the rehabilitation medical case data is extracted. Under the condition of a small sample, a large amount of effective label data can be generated by the random mask and/or the random replacement data enhancement mode in the invention, so that the recognition precision of the named entity is improved, and the named entity in the rehabilitation medical text data is extracted more effectively.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort to a person skilled in the art.

FIG. 1 is a flowchart of a method for identifying a small sample rehabilitation medical named entity based on data enhancement provided by an embodiment of the invention;

fig. 2 is a schematic structural diagram of a method for identifying a named entity of small sample rehabilitation medical treatment based on data enhancement according to an embodiment of the present invention;

FIG. 3 is a diagram of initial rehabilitation medical case data and named entity classification according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of rehabilitation medical case data after using a random mask according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of rehabilitation medical case data after random replacement according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of rehabilitation medical case data using random substitution in combination with a random mask, provided by an embodiment of the present invention;

FIG. 7 is a schematic diagram of a named entity recognition network structure according to an embodiment of the present invention;

FIG. 8 is a diagram illustrating a named entity recognition network for extracting named entity results according to an embodiment of the present invention;

fig. 9 is a schematic diagram of a small sample rehabilitation medical named entity recognition device based on data enhancement according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The features of the following examples and embodiments may be combined with each other without any conflict.

Referring to fig. 1 and 2, an embodiment of the present invention provides a method for identifying a small sample rehabilitation medical named entity based on data enhancement, the method comprising the following steps:

step S1, initial rehabilitation medical case data are obtained, named entity division is carried out, and BIOS labeling is carried out on the rehabilitation medical case data after the named entity division.

After the medical field in this example obtains valid labeled rehabilitation medical case data, the rehabilitation medical case data format and named entity information in the rehabilitation medical case data are analyzed. The data storage format is json format, wherein the rehabilitation medical case data contains 14 types of named entities, and the named entities are important information or useful information of text paragraphs. The named entity names and corresponding English categories contained in the rehabilitation medical case data are respectively as follows: name: a name; gender: sex; age: carrying out an age; diagnosis of disease name: a break; the course of the disease: plurse; affecting the limb: AL; basic disease/other diseases: UOD; clinical manifestations: CM; quantized values: scale; rehabilitation equipment: a device; treatment time: the events; other devices/treatments: ODT; before use: pre; after use: and (5) post.

In addition, each character in the rehabilitation medical case data is a single character, and the characters are not related, but in actual conditions, the characters and the characters form words which accord with actual semantic information, because the words need to be marked, the invention adopts a BIOS marking method, and B, namely Begin, represents the first character of a forming entity; i, i.e., an instruction, represents the other characters of the constituent entity than the first character; o, other, represents a non-physical character, for marking irrelevant characters; s, single, represents a Single entity character.

The list formed by characters or character strings is obtained through the BIOS labeling method, before the network training or testing of the named entity recognition, a word list and a label list are also required to be constructed, and the original characters and labels are respectively mapped into index positions in the word list and the label list.

Illustratively, as shown in fig. 3, an example of rehabilitation medical case data is provided as "Chen Dage, 35 years old, 2022, 10 and 29 days old, showing a decrease in muscle tone, further showing movement disorder, checking as cerebral infarction, and cerebral edema of surrounding tissues. The section of rehabilitation medical case data contains five types of named entities. The types of the named entities corresponding to Chen Dage are names and names, 35 years old are ages age, 2022 is 10 months 29 days old, disease course is course, dyskinesia is clinical manifestation CM, and cerebral infarction is diagnosis disease. Taking "dyskinesia" as an example, B-CM represents the initial position of the entity, I-CM represents other characters of the entity except the character represented by the initial position, and B-CM and subsequent consecutive I-CM together form a label of "dyskinesia", which represents that "dyskinesia" belongs to the CM type of named entity. The information except the named entity, that is, important information, in the rehabilitation medical case data is marked by O, which represents that the character is irrelevant information in the process of analyzing the quantitative index of the patient condition, for example, the "muscle tension is reduced" in fig. 1, etc.

Step S2, carrying out data enhancement on the rehabilitation medical case data divided by the named entity to obtain rehabilitation medical case data with new labels; comprising the following steps:

and/or the number of the groups of groups,

named entities in the rehabilitation medical case data are randomly replaced among named entity types of the same type.

In particular, the object of the named entity recognition network is to extract named entities classified by the invention from unstructured text, namely important information focused in the rehabilitation medical field relevant to the invention. Meanwhile, in order to ensure the integrity of unstructured text semantic information and structural information, the invention designs a data enhancement method only aiming at named entities in the rehabilitation medical case data, and does not carry out data enhancement on the whole rehabilitation medical case data. The data enhancement method provided by the embodiment of the invention is further described below.

(A) And analyzing the length of each named entity in the named entity divided rehabilitation medical case data, and carrying out random masking on different named entities in the rehabilitation medical case data to obtain the rehabilitation medical case data with new labels.

Further, analyzing the length of each named entity in the rehabilitation medical case data after the named entity division, and carrying out random masking on different named entities in the rehabilitation medical case data comprises:

analyzing the length of each named entity in the rehabilitation medical case data divided by the named entity, setting the average covering rate of the entities, and utilizing

The symbols randomly mask the contents of different named entities in the rehabilitation medical case data.

It should be noted that, in many cases, the rehabilitation medical data obtained in this example is incomplete, for example, the text lacks characters, resulting in poor semantics. In view of such circumstances, the present invention performs a masking operation for key information, namely named entities, in the rehabilitation medical text, namely using rare'

The symbol randomly masks the content of the named entity. Although the random masking method damages semantic information of the rehabilitation medical data to a certain extent, the random masking method is used for naming entity recognition tasks for small samples, so that the rehabilitation medical data can be more in line with actual conditions on one hand, and on the other hand, high-quality and complete brand-new data can be generated according to initial rehabilitation medical case data, and the problem that samples are too few in neural network training is solved. In this regard, in this example, each named entity content was analyzed, taking into account that the design entity average mask rate was 25%. Illustratively, "Chen Dage" is "old +.after being randomly masked" as shown in FIG. 4>

"35 years old" is "3 +.>

Age ","2022, 10, 29 "is" 20->

Annual->

After random masking, 0 month 29 day "," dyskinesia "is" sports +.>

The obstruction and cerebral infarction are treated by random masking to form brain +.>

Dead). Experiments prove that the masking rate can mask entities with different lengths to different degrees, and the masking effectiveness is ensured on the basis of saving semantic information as much as possible.

(B) Named entities in the rehabilitation medical case data are randomly replaced among named entity types of the same type.

And classifying the named entities of the classified rehabilitation medical case data, and randomly exchanging the named entity types of the same type to obtain the rehabilitation medical case data with new labels.

Illustratively, as shown in fig. 5, "Chen Dage" is "Ban Qin good" after random exchange, "75 years" after random exchange, "10 months 29 days" in 2022 "is" 07 months 01 days in 2021 "after random exchange," healthy side leg flexion during walking "after random exchange," and "epileptic" after random exchange.

It should be noted that, in this example, the medical field data generally has randomness, and for the same disease, the actual disease condition of each person is different, so in the case of a small sample, the disease condition of the same disease cannot be covered completely, and considering this situation, the invention performs random replacement for the named entity with the same type in the rehabilitation medical case data. Although the generated rehabilitation medical case data with the new label is often not in accordance with the actual situation logically, in the case of a small sample, the influence of the lack of data information on the training result of the neural network is larger. Meanwhile, experiments prove that in the named entity network training process, the influence of the association and the appearance sequence among the entities on the performance index of the network is small. By using the random replacement method, on one hand, the generated rehabilitation medical case data with the new label and the initial rehabilitation medical case data are combined to more completely cover the situation of the medical data in the example, so that the basic pathological information is more complete, on the other hand, more updated data are generated, and the problem of too few samples in the training of the neural network is solved.

(C) And combining the random mask with random replacement, and carrying out data enhancement on the rehabilitation medical case data divided by the named entity to obtain rehabilitation medical case data with new labels.

Exemplary, as shown in FIG. 6, "Chen Dage" is "class" after being randomly masked and randomly replaced

Preferably, "35 years old" is "75 +.>

","2022, 10, 29 days "is" ++after random masking and random replacement>

021->

7 months 01%>

"dyskinesia" is "line +.after random masking and random replacement->

Shi Jian->

Leg flexion "," cerebral infarction "after random masking and random replacement is>

Epilepsy is obtained.

It should be noted that, the combination of the random mask and the random replacement method is used, and on the basis of the random replacement, the random mask method is used to replace the data with covering capability for various symptoms in the actual situation of the example, so that the enhanced rehabilitation medical case data more accords with the quality of the rehabilitation medical data in the actual situation.

And step S3, constructing a named entity recognition network, and inputting the initial rehabilitation medical case data and the rehabilitation medical case data with the new label into the named entity recognition network to obtain a rehabilitation medical named entity recognition result.

The method further comprises the steps of: and fine tuning the named entity recognition network by using the initial rehabilitation medical case data and the rehabilitation medical case data with the new label.

Considering the data distribution of the rehabilitation medical data set used in this example, a network structure of the named entity recognition network using RoBERTa+BiLSTM+CRF is constructed. The RoBERTa obtains powerful sentence semantic extraction capability by pre-training on a large-scale corpus data set consisting of Chinese wikipedia, and can obtain a better result by combining a fine tuning mode with a named entity recognition task. The BiLSTM can fuse the context information, so that the network can learn the semantic information of sentences better. Sequence labeling can be viewed as a matter of multi-classification of sequence elements without explicit constraints between sequence tags. Named entity identification is a joint labeling task, the labels have a dependency relationship, a CRF introduces a characteristic function, the characteristics of each moment can be added when a global sequence is calculated, information is acquired more comprehensively, and a better sequence labeling effect can be obtained. And the accuracy of sequence labeling is improved.

Fig. 7 shows the network structure of roberta+bilstm+crf.

Recording the initial rehabilitation medical case data and the rehabilitation medical case data with the new label as a first text sequence

Representing each character in the sentence, +.>

Representing the length of sentence text and adding a start identifier [ CLS ] at the start position of the first text sequence]Inputting into Roberta network to obtain a first vector representation containing each character information +.>

；

Representing the first vector

Alignment with the initial rehabilitation medical case data to obtain a second vector representation +.>

；

Representing the second vector

Inputting the third expression into BiLSTM network to perform semantic learning and processing of the context to obtain a third expression +.>

；

Representing a third vector

Inputting the predicted sequence into a CRF layer to obtain a predicted sequence representation; the predicted sequence is mapped by the word list and the tag list to obtain a predicted tag sequence, and a rehabilitation medical named entity recognition result is obtained.

An evaluation score is calculated using the predicted tag sequence.

The evaluation score is calculated by adopting Micro-Averaging evaluation, and Micro-Averaging (Micro-Averaging) is performed by carrying out statistics on each example non-classification in the data set to establish a global confusion matrix, and then calculating corresponding indexes. The calculation formula is as follows:

where n represents the total number of text in the sample,

representing the number of correctly identified entities in the ith text,/>

Representing the number of erroneously identified entities in the ith text,/->

Indicating the number of incorrectly identified entities in the ith text. MicroP represents precision, which means the proportion of the sample with the entity correctly identified in all the samples with the entity identified; microR represents recall, also called recall, and refers to the proportion of samples that correctly identify an entity in all correct entity samples; microF represents the harmonic mean of MicroP and MicroR, the value range of MicroF is +.>

The closer the outcome of the MicroF is to 1, the better the performance of the named entity recognition network.

And carrying out statistics on the prediction results of each entity class by using Micro-Averaging, and calculating the precision MicroP and recall MicroR. Generally, if the proportion of the number of a certain entity sample in all samples is smaller, the identification effect cannot be well reflected by using a conventional evaluation index, each entity class has an equal position by using Micro-average as the evaluation index, the influence of the number and the size of each entity on the calculation of the Micro f is balanced, the result of each sample during training is focused, and the result of the whole data set is more approximate to an objective result.

The closer the evaluation function result used by the network of the example is to 1, the better the performance of the named entity recognition network is. As shown in Table 1 below, column 1 is the named entity type defined in step 2; the 2 nd column is the initial rehabilitation medical case data input to a named entity recognition network, and the extraction precision is carried out for each named entity; the 3 rd to 5 th columns are the initial rehabilitation medical case data, the rehabilitation medical case data with new labels, which are obtained after random masking, the rehabilitation medical case data with new labels, which are obtained after random replacement, and the rehabilitation medical case data with new labels, which are obtained by combining the random masking and the random replacement, are merged and input into a named entity recognition network, and the extraction precision is carried out for each named entity; and (5) carrying out thickening marking on the optimal precision extracted from the same named entity every 1 line.

Table 1: named entity identification evaluation result table

As can be seen from analysis table 1, the method for identifying the named entity of the rehabilitation medical treatment based on the small sample with data enhancement provided by the invention can generate a new effective data label aiming at the problem of lack of effective label data in the rehabilitation medical treatment field of the example, and has obvious advantages of realizing higher-precision named entity identification under the condition of a small quantity of rehabilitation medical treatment data.

As shown in FIG. 8, the original case information is "Liu x, man, 41 years old cerebral infarction, left lower limb organism weakness for more than one month". Before use: the muscle strength of the left lower limb is 3 level, the standing position is balanced by 2 level, and other people assist in walking downwards. The lower limb exoskeleton robot has 1 course of treatment and is matched with PT manipulation for treatment. After use: the muscle strength of the left lower limb is 4 level, the standing position is balanced three levels, and the left lower limb walks independently. "the case information after data enhancement is input to a named entity recognition network," Liu x "is extracted as" name ", and" Man "is extracted as" gender: sex ","41 years "extract as" age: age "," cerebral infarction "is extracted as" diagnostic disease name: treatment ","1 course "extracts as" treatment time: the lower limb exoskeleton robot is extracted as rehabilitation equipment: device "," treatment with PT manipulation "is extracted as" other devices/treatments: ODT "," muscle strength 3 scale "is extracted as" quantized value: scale "," left lower limb "extracts as" influencing limb: AL "," before use "is extracted as" before use: pre "," post-use "extract as" post-use: post "," organism weakness "," standing balance level 2 "," walking under assistance of others "," standing balance level three "," independent walking "are extracted as" clinical manifestations: CM ", can be effectual from this paragraph draw the naming entity that defines in advance, finish the information extraction to the important information of former case information.

Therefore, the small sample rehabilitation medical named entity identification method based on data enhancement provided by the invention is used for generating additional effective label data through the data enhancement method according to the existing small amount of effective label data under the condition of lacking enough effective label data; the initial rehabilitation medical case data and the newly generated rehabilitation medical case data with new labels are simultaneously input into a named entity identification network, and the named entity identification network pre-trained by using the universal identification data is applied to the rehabilitation medical field through a pre-training and fine-tuning means for extracting important information in the rehabilitation medical data in the example. Meanwhile, compared with the initial rehabilitation medical case data which is independently input into a named entity recognition network, the accuracy of the recognition of most named entities is greatly improved or leveled, and the named entities in the text data can be more effectively extracted.

Corresponding to the embodiment of the small sample rehabilitation medical named entity identification method based on data enhancement, the invention also provides an embodiment of the small sample rehabilitation medical named entity identification device based on data enhancement.

Referring to fig. 9, a device for identifying a small sample rehabilitation medical named entity based on data enhancement provided by an embodiment of the invention includes one or more processors configured to implement the method for identifying a small sample rehabilitation medical named entity based on data enhancement in the above embodiment.

The embodiment of the invention based on the data enhanced small sample rehabilitation medical named entity recognition device can be applied to any equipment with data processing capability, wherein the equipment with data processing capability can be equipment or a device such as a computer. The apparatus embodiments may be implemented by software, or may be implemented by hardware or a combination of hardware and software. Taking software implementation as an example, the device in a logic sense is formed by reading corresponding computer program instructions in a nonvolatile memory into a memory by a processor of any device with data processing capability. In terms of hardware, as shown in fig. 9, a hardware structure diagram of an apparatus with optional data processing capability where the apparatus for identifying a small sample rehabilitation medical named entity based on data enhancement according to the present invention is shown, except for a processor, a memory, a network interface, and a nonvolatile memory shown in fig. 9, the apparatus with optional data processing capability in the embodiment generally includes other hardware according to an actual function of the apparatus with optional data processing capability, which is not described herein.

The implementation process of the functions and roles of each unit in the above device is specifically shown in the implementation process of the corresponding steps in the above method, and will not be described herein again.

For the device embodiments, reference is made to the description of the method embodiments for the relevant points, since they essentially correspond to the method embodiments. The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purposes of the present invention. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

The embodiment of the invention also provides a computer readable storage medium, wherein a program is stored on the computer readable storage medium, and when the program is executed by a processor, the method for identifying the small sample rehabilitation medical named entity based on data enhancement in the embodiment is realized.

The computer readable storage medium may be an internal storage unit, such as a hard disk or a memory, of any of the data processing enabled devices described in any of the previous embodiments. The computer readable storage medium may be any device having data processing capability, for example, a plug-in hard disk, a Smart Media Card (SMC), an SD Card, a Flash memory Card (Flash Card), or the like, which are provided on the device. Further, the computer readable storage medium may include both internal storage units and external storage devices of any data processing device. The computer readable storage medium is used for storing the computer program and other programs and data required by the arbitrary data processing apparatus, and may also be used for temporarily storing data that has been output or is to be output.

Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains. The specification and examples are to be regarded in an illustrative manner only.

It is to be understood that the present application is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof.

Claims

1. A method for identifying a small sample rehabilitation medical named entity based on data enhancement, the method comprising:

and/or the number of the groups of groups,

2. The method for identifying a named entity of a small sample rehabilitation medical treatment based on data enhancement according to claim 1, wherein the named entity type corresponding to the rehabilitation medical treatment case data comprises:

name, sex, age, name of disease diagnosed, course of disease, affecting limb, underlying disease/other disease, clinical manifestation, quantified value, rehabilitation device, treatment time, other device/treatment, pre-use, post-use.

3. The method for identifying small sample rehabilitation medical named entity based on data enhancement according to claim 1 or 2, wherein the BIOS labeling of the named entity-divided rehabilitation medical case data comprises:

BIOS labeling is carried out on the rehabilitation medical case data divided by the named entity so as to construct a word list and a label list, and characters and labels in the rehabilitation medical case data divided by the named entity are respectively mapped into index positions in the word list and the label list;

wherein B represents a first character constituting an entity, I represents other characters constituting the entity than the first character, O represents a non-entity character, and S represents a single entity character.

4. The method for identifying small sample rehabilitation medical named entity based on data enhancement according to claim 1, wherein analyzing the length of each named entity in the rehabilitation medical case data after the named entity division, and performing random masking on different named entities in the rehabilitation medical case data comprises:

5. The data-enhanced small sample rehabilitation medical named entity identification method of claim 1, wherein randomly replacing named entities in rehabilitation medical case data between named entity types of the same type comprises:

and classifying the named entities for the classified rehabilitation medical case data, and randomly replacing the named entity types of the same type.

6. The data-enhanced small sample rehabilitation medical named entity recognition method according to claim 1, wherein the named entity recognition network consists of a Roberta network, a BiLSTM network and a CRF layer which are connected in sequence.

7. The method for identifying a small sample rehabilitation medical named entity based on data enhancement according to claim 6, wherein inputting initial rehabilitation medical case data and rehabilitation medical case data with new labels into a named entity identification network, obtaining a rehabilitation medical named entity identification result comprises:

recording initial rehabilitation medical case data and rehabilitation medical case data with new labels as a first text sequence, adding a start identifier at the start position of the first text sequence, and inputting the initial rehabilitation medical case data and the rehabilitation medical case data into a RoBERTa network to obtain a first vector representation containing each character information;

aligning the first vector representation with the initial rehabilitation medical case data to obtain a second vector representation;

inputting the second vector representation into a BiLSTM network to perform semantic learning and processing of the context, and obtaining a third vector representation;

inputting the third vector representation into the CRF layer to obtain a predicted sequence representation; the predicted sequence is mapped by the word list and the tag list to obtain a predicted tag sequence, and a rehabilitation medical named entity recognition result is obtained.

8. The method for identifying a small sample rehabilitation medical named entity based on data enhancement according to claim 6, wherein inputting initial rehabilitation medical case data and rehabilitation medical case data with new labels into a named entity identification network, obtaining a rehabilitation medical named entity identification result further comprises:

and fine tuning the named entity recognition network by using the initial rehabilitation medical case data and the rehabilitation medical case data with the new label.

9. A data enhancement based small sample rehabilitation medical named entity recognition device comprising one or more processors configured to implement the data enhancement based small sample rehabilitation medical named entity recognition method of any one of claims 1-8.

10. A computer readable storage medium having stored thereon a program which, when executed by a processor, is adapted to carry out the data enhancement based small sample rehabilitation medical named entity identification method of any of claims 1-8.