CN112966515A

CN112966515A - Medical entity identification method

Info

Publication number: CN112966515A
Application number: CN202110378224.6A
Authority: CN
Inventors: 沈同平; 金力; 黄方亮; 孟庆全; 王元茂; 许欢庆
Original assignee: Individual
Current assignee: Individual
Priority date: 2021-04-08
Filing date: 2021-04-08
Publication date: 2021-06-15

Abstract

The invention discloses a medical entity identification method, which relates to the technical field of information extraction and comprises the following steps: selecting a proper acquirer according to the acquisition value to acquire and store the medical text to be identified; then analyzing the acquired medical texts to obtain an identification priority list of the medical texts; feeding back the sequence position of the medical text in the identification priority table to a control center; the control center identifies the medical texts according to the fed sequence positions, so that the identification of the medical texts is orderly carried out, and the identification efficiency is improved; performing iterative self-learning through a semi-supervised learning process, and filtering and calibrating the predicted word segmentation result; the method and the device can reasonably select the corresponding general practitioners to carry out manual checking according to the pushing value, improve checking efficiency, determine the number of the general practitioners to carry out manual checking according to the checking value of the predicted word segmentation result, effectively reduce labor cost and improve checking accuracy.

Description

Medical entity identification method

Technical Field

The invention relates to the technical field of information extraction, in particular to a medical entity identification method.

Background

Medical named entity recognition aims at extracting medical entities from medical texts and classifying their categories, such as drugs, surgery, symptoms, diseases and body parts. For example, given the sentence "patient had lower limb edema before May", the goal of medical named entity recognition is to extract "lower limb" and "edema" from this sentence and classify them as body part entities and disease entities, respectively. Medical named entity identification is an important task in intelligent healthcare and is an important prerequisite for many downstream tasks, such as drug relocation, entity linking and clinical decision support systems. Therefore, medical named entity identification has become an increasing concern in recent years.

The document with publication number CN107168946A discloses a named entity recognition method for medical text data, which uses hidden markov model to label the sequence of original medical text to obtain the result of predictive word segmentation. After the predicted word segmentation processing is finished, iterative self-learning is carried out on the word segmentation result by using a semi-supervised learning method so as to obtain an accurate word segmentation and named entity recognition result.

However, the patent lacks of grading treatment on the original medical text, and does not form an ordered medical text named entity recognition basis; in the process of identifying the named entities of the medical texts, the problem of missed identification or repeated identification of some medical texts is easily caused; when the seed word set is examined, proper workers cannot be selected according to the pushing value to conduct examination, and examination efficiency is improved.

Disclosure of Invention

Aiming at the defects in the prior art, the invention aims to provide a medical entity identification method. According to the medical text recognition method and device, a proper acquirer can be selected to acquire and store the medical text to be recognized according to the acquisition value, so that the acquisition efficiency is improved, and then the acquired medical text is analyzed to obtain the recognition priority list of the medical text; feeding back the sequence position of the medical text in the identification priority table to a control center; the control center identifies the medical texts according to the fed sequence positions, so that the identification of the medical texts is orderly carried out, and the identification efficiency is improved; the corresponding general practitioners can be reasonably selected according to the pushing value to carry out manual checking, checking efficiency is improved, the number of the general practitioners to carry out manual checking is determined according to the checking value of the predicted word segmentation result, labor cost is effectively reduced, and checking accuracy is improved.

The purpose of the invention can be realized by the following technical scheme:

a medical entity identification method, comprising the steps of:

the method comprises the following steps: acquiring and storing a medical text to be identified;

step two: analyzing the acquired medical texts to obtain an identification priority list of the medical texts; feeding back the sequence position of the medical text in the identification priority table to a control center; the control center identifies the medical text according to the fed-back sequence position;

step three: collecting medical dictionaries, and sorting the medical dictionaries into a disease word bank, a symptom word bank, a checking word bank and a treatment word bank; marking the medical text by using the collected medical dictionary through a hidden Markov model to obtain a predicted word segmentation result;

step four: performing iterative self-learning through a semi-supervised learning process, and filtering and calibrating the predicted word segmentation result; selecting corresponding general practitioners to manually check, check leakage and repair defects; and obtaining a final medical named entity recognition result of the medical text.

Further, the acquiring and storing of the medical text to be recognized in the first step specifically includes:

s11: acquiring workers working at the current time and marking the workers as primary selection workers;

s12: calculating the time difference between the time of entry of the primary election personnel and the current time of the system to obtain the time of entry of the primary election personnel and marking the time as SD;

setting the age of the primary selected person as SF; setting the acquisition times of the primary selection personnel as SG;

s13: normalizing the enrollment duration, the age and the recording times and taking the values;

acquiring an acquisition value SZ of the primary selector by using a formula SZ (SD multiplied by A1+ SG multiplied by A2- | SF-35 |. times A3) × ST-1.2356; wherein ST is the collecting value of the primary selection personnel; a1, A2 and A3 are all preset coefficient factors;

s14: selecting the primary selecting person with the maximum acquisition value SZ as an acquirer;

s15: sending the acquisition instruction to a mobile phone terminal of an acquirer; meanwhile, the collection times of the collector are increased once;

s16: after receiving the acquisition instruction, the acquirer acquires and stores the medical text to be identified;

calculating the time difference between the acquisition ending time and the acquisition starting time to obtain the acquisition duration of the acquirer, and marking the acquisition duration as TA; setting the score value input by a user as B;

carrying out normalization processing on the acquisition duration and the input score value and taking the value;

acquiring a single value of an acquirer by using a formula DT of 1/TA × B1+ B × B2, summing all the single values of the acquirer and averaging to obtain an acquiring value ST of the acquirer; wherein b1 and b2 are both preset proportionality coefficients.

Further, the specific step of obtaining the identification priority list of the medical text in the second step is as follows:

s21: acquiring the generation time of the medical text, calculating the time difference between the generation time and the current time of the system to obtain the delay time of the medical text, and marking the delay time as YT;

s22: collecting retrieval records of medical texts within thirty days before the current time of the system; the retrieval record comprises retrieval persons, retrieval starting time and retrieval ending time;

accumulating the number of the retrieval times of the medical texts to form retrieval frequency, and marking as P1;

sequencing the retrieval records of the medical texts according to the retrieval people, counting the number of the retrieval people of the medical texts and marking as P2;

calculating the time difference between the reading starting time and the reading ending time to obtain the reading time length, accumulating the reading time length to form the total reading time length, and marking the total reading time length as P3;

carrying out weight distribution on the number of the people who have read and called and the total time length of the people who have read and called, wherein the weight of the number of the people who have read and called is Z1, the weight of the number of the people who have read and called is Z2, and the weight of the total time length of the people who have read and called is Z3; wherein Z1+ Z2+ Z3 is 1;

obtaining a retrieval attraction value QT of the medical text by using a formula QT ═ P1 xZ 1+ P2 xZ 2+ P3 xZ 3;

s23: sequencing the retrieval records of the medical texts according to retrieval starting time, acquiring the retrieval starting time of the medical texts at the last time and marking as ZT 1;

calculating the time difference between the last retrieval starting time of the medical text and the current time of the system to obtain a buffer duration and marking the buffer duration as HT 1;

s24: normalizing the delay time, the retrieval attraction value and the buffer time and taking the values of the delay time, the retrieval attraction value and the buffer time;

acquiring a good recognition value QT of the medical text by using a formula YS (YT) multiplied by Z4+ QT multiplied by Z5+1/HT1 multiplied by Z6; wherein Z4, Z5 and Z6 are all preset proportionality coefficients;

s25: and (5) arranging the medical texts in a descending order according to the size of the optimal recognition value QT to generate a recognition priority table of the medical texts.

Further, the specific steps of selecting the corresponding general practitioner to carry out manual checking and leakage detection and defect repair in the fourth step are as follows;

s41: acquiring general practitioners in an idle state at the current time and marking the general practitioners as primary-selected doctors;

s42: acquiring personal information of a primary doctor, wherein the personal information comprises name, gender, academic calendar, doctor qualification information, belonging hospital, medical time and hospital grade; the physician qualification information comprises the acquisition time of the physician qualification;

s43: acquiring the academic information of the primarily selected doctor, and dividing the academic information into four grades of special subject, master and doctor; setting each grade to correspond to a academic preset value, and assigning a special grade e, a subject grade f, a master grade g and a doctor grade h; wherein e, f, g and h are fixed numerical values, and e is more than f and less than g and less than h;

matching the academic calendar information of the primarily selected doctor with all the academic calendar grades to obtain a corresponding academic calendar preset value and marking the preset value as Xc;

calculating the time difference between the doctor qualification acquiring time of the primary selected doctor and the current time of the system to obtain the duration of the certificate holding, and marking the duration as XT;

calculating the time difference between the slave medical time of the initially selected doctor and the current time of the system to obtain the slave medical time length, and marking the slave medical time length as Xd;

obtaining qualification coefficient FA of the primary doctor by using the formula FA as Xc × c1+ XT × c2+ Xd × c 3; wherein c1, c2 and c3 are all preset coefficients;

s44: comparing the qualification coefficient FA of the initially selected doctor with a set qualification coefficient threshold; if the qualification coefficient FA of the initially selected doctor is larger than the set qualification coefficient threshold, marking the initially selected doctor as a preferred doctor;

s45: acquiring historical clinic data of a preferred doctor within preset time; the historical clinic data comprises clinic times and user evaluation coefficients; the rule of the user evaluation coefficient is as follows: scoring the diagnosis and treatment of the doctor, wherein the score is 100; marking the user evaluation coefficient as Qx, summing the user evaluation coefficients Qx, and then averaging to obtain an evaluation average value Qs; marking the number of times of the diagnosis as Cs;

setting hospital grades of all hospitals to correspond to a preset value, matching the hospital grade of the hospital of the preferred doctor with all the hospital grades to obtain the corresponding preset value, and marking the preset value as DS;

setting the number of times of manual checking of a preferred doctor as CT;

obtaining a pushing value WS of a preferable doctor by using a formula of FA × c4+ Qs × c5+ Cs × c6+ DS × c7+ CT × c 8; wherein c4, c5, c6, c7 and c8 are all preset proportionality coefficients;

s46: sorting the preferred doctors according to the pushing value WS of the preferred doctors from high to low;

s47: screening out a preset number of general doctors as selected doctors according to the sequence of the preferred doctors; and manually checking the predicted word segmentation result, and checking for missing and filling in gaps.

Further, the step S47 of screening out a preset number of general practitioners as selected physicians according to the ranking of the preferred physicians specifically includes:

AA 1: acquiring the text size of a predicted word segmentation result, and marking the text size as WA;

AA 2: acquiring a medical text corresponding to a predicted word segmentation result, and marking a patient corresponding to the medical text as a target patient;

AA 3: acquiring diagnosis and treatment records of a target patient within preset time;

accumulating the diagnosis and treatment times of the target patient to form diagnosis and treatment frequency, and marking as L1;

accumulating the diagnosis and treatment amounts of the target patients to form a total diagnosis and treatment amount, and marking the total diagnosis and treatment amount as L2;

AA 4: obtaining a check value WX of a predicted word segmentation result by using a formula WX of WA × d1+ L1 × d2+ L2 × d3, wherein d1, d2 and d3 are all preset coefficients;

AA 5: when the check value WX meets the condition that WX is more than 0 and less than or equal to K1, screening INT (f multiplied by WX) general practitioners as selected doctors; when the maintenance value WX meets the condition that K1 is less than WX, INT [ (1+ f) xWX ] general practitioners are screened out to be selected doctors, wherein INT (f x WX) represents the maximum integer not exceeding f x WX; INT [ (1+ f). times.WX ] represents the largest integer not exceeding (1+ f). times.WX; f is a predetermined coefficient and f > 0.

The invention has the beneficial effects that:

1. the medical text to be recognized is collected and stored, and workers on duty at the current time are obtained and marked as the primary selection workers; calculating the time difference between the time of entry of the primary election and the current time of the system to obtain the time of entry of the primary election, and setting the age and the collection times of the primary election; acquiring an acquisition value SZ of the primary selection personnel by combining a correlation algorithm, and selecting the primary selection personnel with the maximum acquisition value SZ as the acquisition personnel to improve the acquisition efficiency;

2. the invention analyzes the collected medical text; acquiring retrieval records of medical texts within thirty days before the current time of a system, acquiring retrieval attraction values by combining retrieval frequency, retrieval number and retrieval total duration, calculating to obtain delay duration and buffer duration of the medical texts, acquiring optimal identification values QT of the medical texts by using a formula YS (YT multiplied by Z4+ QT multiplied by Z5+1/HT1 multiplied by Z6, performing descending arrangement on the medical texts according to the size of the optimal identification values QT to generate an identification priority table of the medical texts, and feeding back sequence positions of the medical texts in the identification priority table to a control center; the control center identifies the medical text according to the fed-back sequence position; the medical text recognition is orderly carried out, and the recognition efficiency is improved;

3. the invention carries out iterative self-learning through a semi-supervised learning process, and filters and calibrates the predicted word segmentation result; selecting corresponding general practitioners to manually check, check leakage and repair defects; obtaining a final medical named entity recognition result of the medical text; acquiring general practitioners in an idle state at the current time and marking the general practitioners as primary-selected doctors; acquiring personal information of a primary selected doctor to obtain a corresponding academic preset value, a corresponding duration and a corresponding medical duration, obtaining a qualification coefficient of the primary selected doctor by using a formula FA (Xc × c1+ XT × c2+ Xd × c3, and marking the primary selected doctor as a preferred doctor if the qualification coefficient FA of the primary selected doctor is greater than a set qualification coefficient threshold; acquiring historical clinic data of a preferred doctor within preset time; the method has the advantages that the pushing value of the optimal doctor is obtained by combining the evaluation mean value Qs, the number of times of seeing a doctor and the preset value corresponding to the hospital grade, the corresponding general doctors can be reasonably selected according to the pushing value WS of the optimal doctor to carry out manual checking, checking efficiency is improved, the number of the general doctors to carry out manual checking is determined according to the checking value of the predicted word segmentation result, labor cost is effectively reduced, and checking accuracy is improved.

Drawings

In order to facilitate understanding for those skilled in the art, the present invention will be further described with reference to the accompanying drawings.

FIG. 1 is a block diagram of the system of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

As shown in fig. 1, a medical entity identification method includes the following steps:

the method comprises the following steps: acquiring and storing a medical text to be identified; the method comprises the following steps:

acquiring an acquisition value SZ of the primary selector by using a formula SZ (SD multiplied by A1+ SG multiplied by A2- | SF-35 |. times A3) × ST-1.2356; wherein ST is the collecting value of the primary selection personnel; a1, A2 and A3 are all preset coefficient factors; for example, a1 takes a value of 0.87, a2 takes a value of 0.35, and A3 takes a value of 0.56;

acquiring a single value of an acquirer by using a formula DT of 1/TA × B1+ B × B2, summing all the single values of the acquirer and averaging to obtain an acquiring value ST of the acquirer; wherein b1 and b2 are both preset proportionality coefficients; for example, b1 takes the value of 0.658, b2 takes the value of 0.345;

step two: analyzing the acquired medical texts to obtain an identification priority list of the medical texts; feeding back the sequence position of the medical text in the identification priority table to a control center; the control center identifies the medical text according to the fed-back sequence position; the method specifically comprises the following steps:

carrying out weight distribution on the number of the people who have read and called and the total time length of the people who have read and called, wherein the weight of the number of the people who have read and called is Z1, the weight of the number of the people who have read and called is Z2, and the weight of the total time length of the people who have read and called is Z3; wherein Z1+ Z2+ Z3 is 1; for example, Z1 takes the value of 0.6, a2 takes the value of 0.3, and A3 takes the value of 0.1;

acquiring a good recognition value QT of the medical text by using a formula YS (YT) multiplied by Z4+ QT multiplied by Z5+1/HT1 multiplied by Z6; wherein Z4, Z5 and Z6 are all preset proportionality coefficients; for example, Z4 takes the value 0.346, Z5 takes the value 0.573, and Z6 takes the value 0.517;

s25: the medical texts are arranged in a descending order according to the size of the optimal identification value QT to generate an identification priority table of the medical texts;

step four: performing iterative self-learning through a semi-supervised learning process, and filtering and calibrating the predicted word segmentation result; selecting corresponding general practitioners to manually check, check leakage and repair defects; obtaining a final medical named entity recognition result of the medical text;

the specific steps of selecting the corresponding general practitioner to carry out manual checking, leakage checking and defect repairing in the fourth step are as follows:

obtaining qualification coefficient FA of the primary doctor by using the formula FA as Xc × c1+ XT × c2+ Xd × c 3; wherein c1, c2 and c3 are all preset coefficients; for example, c1 takes the value of 0.32, c2 takes the value of 0.52, and c2 takes the value of 0.45;

setting the number of times of manual checking of a preferred doctor as CT;

obtaining a pushing value WS of a preferable doctor by using a formula of FA × c4+ Qs × c5+ Cs × c6+ DS × c7+ CT × c 8; wherein c4, c5, c6, c7 and c8 are all preset proportionality coefficients; for example, c4 takes the value of 0.22, c5 takes the value of 0.35, c6 takes the value of 0.45, c7 takes the value of 0.24, and c8 takes the value of 0.64;

s47: screening out a preset number of general doctors as selected doctors according to the sequence of the preferred doctors; carrying out manual checking, omission checking and filling in the predicted word segmentation result;

in step S47, screening out a preset number of general practitioners as selected physicians according to the ranking of the preferred physicians, specifically including:

AA 4: obtaining a check value WX of a predicted word segmentation result by using a formula WX of WA × d1+ L1 × d2+ L2 × d3, wherein d1, d2 and d3 are all preset coefficients; for example, d1 takes the value of 0.18, d2 takes the value of 0.56, and d3 takes the value of 0.49;

The working principle of the invention is as follows:

a medical entity recognition method comprises the steps that during work, medical texts to be recognized are collected and stored, workers on duty at the current time are obtained and marked as primary selection workers; calculating the time difference between the time of entry of the primary election and the current time of the system to obtain the time of entry of the primary election, and setting the age and the collection times of the primary election; acquiring an acquisition value SZ of the primary selection personnel by combining a correlation algorithm, and selecting the primary selection personnel with the maximum acquisition value SZ as the acquisition personnel to improve the acquisition efficiency; then analyzing the collected medical texts; acquiring retrieval records of medical texts within thirty days before the current time of a system, acquiring retrieval attraction values by combining retrieval frequency, retrieval number and retrieval total duration, calculating to obtain delay duration and buffer duration of the medical texts, acquiring optimal identification values QT of the medical texts by using a formula YS (YT multiplied by Z4+ QT multiplied by Z5+1/HT1 multiplied by Z6, performing descending arrangement on the medical texts according to the size of the optimal identification values QT to generate an identification priority table of the medical texts, and feeding back sequence positions of the medical texts in the identification priority table to a control center; the control center identifies the medical text according to the fed-back sequence position; the medical text recognition is orderly carried out, and the recognition efficiency is improved;

collecting medical dictionaries, and sorting the medical dictionaries into a disease word bank, a symptom word bank, a checking word bank and a treatment word bank; marking the medical text by using the collected medical dictionary through a hidden Markov model to obtain a predicted word segmentation result; performing iterative self-learning through a semi-supervised learning process, and filtering and calibrating the predicted word segmentation result; selecting corresponding general practitioners to manually check, check leakage and repair defects; the final medical named entity recognition result of the medical text is obtained, corresponding general doctors can be reasonably selected according to the push value WS of the optimal doctor to carry out manual checking, checking efficiency is improved, the number of the general doctors carrying out manual checking is determined according to the checking value WS of the predicted word segmentation result, labor cost is effectively reduced, and checking accuracy is improved.

The above formulas are all obtained by collecting a large amount of data to perform software simulation and performing parameter setting processing by corresponding experts, and the formulas are in accordance with real results.

The preferred embodiments of the invention disclosed above are intended to be illustrative only. The preferred embodiments are not intended to be exhaustive or to limit the invention to the precise forms disclosed. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, to thereby enable others skilled in the art to best utilize the invention. The invention is limited only by the claims and their full scope and equivalents.

Claims

1. A medical entity identification method, comprising the steps of:

2. The method for identifying a medical entity according to claim 1, wherein the step one of collecting and storing the medical text to be identified specifically comprises:

3. The method for identifying medical entities according to claim 1, wherein the step two of obtaining the identification priority list of the medical texts comprises the following specific steps:

4. The method according to claim 1, wherein the step four comprises selecting corresponding general practitioners to manually check, and missing and filling;

setting the number of times of manual checking of a preferred doctor as CT;

s47: screening out a preset number of general doctors as selected doctors according to the sequence of the preferred doctors; and (4) selecting a doctor to manually check the predicted word segmentation result, and checking for missing and filling up the missing.

5. The method of claim 4, wherein the step S47 of screening out a predetermined number of general practitioners as the selected physicians according to the order of the preferred physicians comprises: