CN112966515A - Medical entity identification method - Google Patents

Medical entity identification method Download PDF

Info

Publication number
CN112966515A
CN112966515A CN202110378224.6A CN202110378224A CN112966515A CN 112966515 A CN112966515 A CN 112966515A CN 202110378224 A CN202110378224 A CN 202110378224A CN 112966515 A CN112966515 A CN 112966515A
Authority
CN
China
Prior art keywords
medical
time
value
marking
doctor
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110378224.6A
Other languages
Chinese (zh)
Inventor
沈同平
金力
黄方亮
孟庆全
王元茂
许欢庆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN202110378224.6A priority Critical patent/CN112966515A/en
Publication of CN112966515A publication Critical patent/CN112966515A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Educational Administration (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Development Economics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Economics (AREA)
  • General Engineering & Computer Science (AREA)
  • Strategic Management (AREA)
  • Game Theory and Decision Science (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The invention discloses a medical entity identification method, which relates to the technical field of information extraction and comprises the following steps: selecting a proper acquirer according to the acquisition value to acquire and store the medical text to be identified; then analyzing the acquired medical texts to obtain an identification priority list of the medical texts; feeding back the sequence position of the medical text in the identification priority table to a control center; the control center identifies the medical texts according to the fed sequence positions, so that the identification of the medical texts is orderly carried out, and the identification efficiency is improved; performing iterative self-learning through a semi-supervised learning process, and filtering and calibrating the predicted word segmentation result; the method and the device can reasonably select the corresponding general practitioners to carry out manual checking according to the pushing value, improve checking efficiency, determine the number of the general practitioners to carry out manual checking according to the checking value of the predicted word segmentation result, effectively reduce labor cost and improve checking accuracy.

Description

Medical entity identification method
Technical Field
The invention relates to the technical field of information extraction, in particular to a medical entity identification method.
Background
Medical named entity recognition aims at extracting medical entities from medical texts and classifying their categories, such as drugs, surgery, symptoms, diseases and body parts. For example, given the sentence "patient had lower limb edema before May", the goal of medical named entity recognition is to extract "lower limb" and "edema" from this sentence and classify them as body part entities and disease entities, respectively. Medical named entity identification is an important task in intelligent healthcare and is an important prerequisite for many downstream tasks, such as drug relocation, entity linking and clinical decision support systems. Therefore, medical named entity identification has become an increasing concern in recent years.
The document with publication number CN107168946A discloses a named entity recognition method for medical text data, which uses hidden markov model to label the sequence of original medical text to obtain the result of predictive word segmentation. After the predicted word segmentation processing is finished, iterative self-learning is carried out on the word segmentation result by using a semi-supervised learning method so as to obtain an accurate word segmentation and named entity recognition result.
However, the patent lacks of grading treatment on the original medical text, and does not form an ordered medical text named entity recognition basis; in the process of identifying the named entities of the medical texts, the problem of missed identification or repeated identification of some medical texts is easily caused; when the seed word set is examined, proper workers cannot be selected according to the pushing value to conduct examination, and examination efficiency is improved.
Disclosure of Invention
Aiming at the defects in the prior art, the invention aims to provide a medical entity identification method. According to the medical text recognition method and device, a proper acquirer can be selected to acquire and store the medical text to be recognized according to the acquisition value, so that the acquisition efficiency is improved, and then the acquired medical text is analyzed to obtain the recognition priority list of the medical text; feeding back the sequence position of the medical text in the identification priority table to a control center; the control center identifies the medical texts according to the fed sequence positions, so that the identification of the medical texts is orderly carried out, and the identification efficiency is improved; the corresponding general practitioners can be reasonably selected according to the pushing value to carry out manual checking, checking efficiency is improved, the number of the general practitioners to carry out manual checking is determined according to the checking value of the predicted word segmentation result, labor cost is effectively reduced, and checking accuracy is improved.
The purpose of the invention can be realized by the following technical scheme:
a medical entity identification method, comprising the steps of:
the method comprises the following steps: acquiring and storing a medical text to be identified;
step two: analyzing the acquired medical texts to obtain an identification priority list of the medical texts; feeding back the sequence position of the medical text in the identification priority table to a control center; the control center identifies the medical text according to the fed-back sequence position;
step three: collecting medical dictionaries, and sorting the medical dictionaries into a disease word bank, a symptom word bank, a checking word bank and a treatment word bank; marking the medical text by using the collected medical dictionary through a hidden Markov model to obtain a predicted word segmentation result;
step four: performing iterative self-learning through a semi-supervised learning process, and filtering and calibrating the predicted word segmentation result; selecting corresponding general practitioners to manually check, check leakage and repair defects; and obtaining a final medical named entity recognition result of the medical text.
Further, the acquiring and storing of the medical text to be recognized in the first step specifically includes:
s11: acquiring workers working at the current time and marking the workers as primary selection workers;
s12: calculating the time difference between the time of entry of the primary election personnel and the current time of the system to obtain the time of entry of the primary election personnel and marking the time as SD;
setting the age of the primary selected person as SF; setting the acquisition times of the primary selection personnel as SG;
s13: normalizing the enrollment duration, the age and the recording times and taking the values;
acquiring an acquisition value SZ of the primary selector by using a formula SZ (SD multiplied by A1+ SG multiplied by A2- | SF-35 |. times A3) × ST-1.2356; wherein ST is the collecting value of the primary selection personnel; a1, A2 and A3 are all preset coefficient factors;
s14: selecting the primary selecting person with the maximum acquisition value SZ as an acquirer;
s15: sending the acquisition instruction to a mobile phone terminal of an acquirer; meanwhile, the collection times of the collector are increased once;
s16: after receiving the acquisition instruction, the acquirer acquires and stores the medical text to be identified;
calculating the time difference between the acquisition ending time and the acquisition starting time to obtain the acquisition duration of the acquirer, and marking the acquisition duration as TA; setting the score value input by a user as B;
carrying out normalization processing on the acquisition duration and the input score value and taking the value;
acquiring a single value of an acquirer by using a formula DT of 1/TA × B1+ B × B2, summing all the single values of the acquirer and averaging to obtain an acquiring value ST of the acquirer; wherein b1 and b2 are both preset proportionality coefficients.
Further, the specific step of obtaining the identification priority list of the medical text in the second step is as follows:
s21: acquiring the generation time of the medical text, calculating the time difference between the generation time and the current time of the system to obtain the delay time of the medical text, and marking the delay time as YT;
s22: collecting retrieval records of medical texts within thirty days before the current time of the system; the retrieval record comprises retrieval persons, retrieval starting time and retrieval ending time;
accumulating the number of the retrieval times of the medical texts to form retrieval frequency, and marking as P1;
sequencing the retrieval records of the medical texts according to the retrieval people, counting the number of the retrieval people of the medical texts and marking as P2;
calculating the time difference between the reading starting time and the reading ending time to obtain the reading time length, accumulating the reading time length to form the total reading time length, and marking the total reading time length as P3;
carrying out weight distribution on the number of the people who have read and called and the total time length of the people who have read and called, wherein the weight of the number of the people who have read and called is Z1, the weight of the number of the people who have read and called is Z2, and the weight of the total time length of the people who have read and called is Z3; wherein Z1+ Z2+ Z3 is 1;
obtaining a retrieval attraction value QT of the medical text by using a formula QT ═ P1 xZ 1+ P2 xZ 2+ P3 xZ 3;
s23: sequencing the retrieval records of the medical texts according to retrieval starting time, acquiring the retrieval starting time of the medical texts at the last time and marking as ZT 1;
calculating the time difference between the last retrieval starting time of the medical text and the current time of the system to obtain a buffer duration and marking the buffer duration as HT 1;
s24: normalizing the delay time, the retrieval attraction value and the buffer time and taking the values of the delay time, the retrieval attraction value and the buffer time;
acquiring a good recognition value QT of the medical text by using a formula YS (YT) multiplied by Z4+ QT multiplied by Z5+1/HT1 multiplied by Z6; wherein Z4, Z5 and Z6 are all preset proportionality coefficients;
s25: and (5) arranging the medical texts in a descending order according to the size of the optimal recognition value QT to generate a recognition priority table of the medical texts.
Further, the specific steps of selecting the corresponding general practitioner to carry out manual checking and leakage detection and defect repair in the fourth step are as follows;
s41: acquiring general practitioners in an idle state at the current time and marking the general practitioners as primary-selected doctors;
s42: acquiring personal information of a primary doctor, wherein the personal information comprises name, gender, academic calendar, doctor qualification information, belonging hospital, medical time and hospital grade; the physician qualification information comprises the acquisition time of the physician qualification;
s43: acquiring the academic information of the primarily selected doctor, and dividing the academic information into four grades of special subject, master and doctor; setting each grade to correspond to a academic preset value, and assigning a special grade e, a subject grade f, a master grade g and a doctor grade h; wherein e, f, g and h are fixed numerical values, and e is more than f and less than g and less than h;
matching the academic calendar information of the primarily selected doctor with all the academic calendar grades to obtain a corresponding academic calendar preset value and marking the preset value as Xc;
calculating the time difference between the doctor qualification acquiring time of the primary selected doctor and the current time of the system to obtain the duration of the certificate holding, and marking the duration as XT;
calculating the time difference between the slave medical time of the initially selected doctor and the current time of the system to obtain the slave medical time length, and marking the slave medical time length as Xd;
obtaining qualification coefficient FA of the primary doctor by using the formula FA as Xc × c1+ XT × c2+ Xd × c 3; wherein c1, c2 and c3 are all preset coefficients;
s44: comparing the qualification coefficient FA of the initially selected doctor with a set qualification coefficient threshold; if the qualification coefficient FA of the initially selected doctor is larger than the set qualification coefficient threshold, marking the initially selected doctor as a preferred doctor;
s45: acquiring historical clinic data of a preferred doctor within preset time; the historical clinic data comprises clinic times and user evaluation coefficients; the rule of the user evaluation coefficient is as follows: scoring the diagnosis and treatment of the doctor, wherein the score is 100; marking the user evaluation coefficient as Qx, summing the user evaluation coefficients Qx, and then averaging to obtain an evaluation average value Qs; marking the number of times of the diagnosis as Cs;
setting hospital grades of all hospitals to correspond to a preset value, matching the hospital grade of the hospital of the preferred doctor with all the hospital grades to obtain the corresponding preset value, and marking the preset value as DS;
setting the number of times of manual checking of a preferred doctor as CT;
obtaining a pushing value WS of a preferable doctor by using a formula of FA × c4+ Qs × c5+ Cs × c6+ DS × c7+ CT × c 8; wherein c4, c5, c6, c7 and c8 are all preset proportionality coefficients;
s46: sorting the preferred doctors according to the pushing value WS of the preferred doctors from high to low;
s47: screening out a preset number of general doctors as selected doctors according to the sequence of the preferred doctors; and manually checking the predicted word segmentation result, and checking for missing and filling in gaps.
Further, the step S47 of screening out a preset number of general practitioners as selected physicians according to the ranking of the preferred physicians specifically includes:
AA 1: acquiring the text size of a predicted word segmentation result, and marking the text size as WA;
AA 2: acquiring a medical text corresponding to a predicted word segmentation result, and marking a patient corresponding to the medical text as a target patient;
AA 3: acquiring diagnosis and treatment records of a target patient within preset time;
accumulating the diagnosis and treatment times of the target patient to form diagnosis and treatment frequency, and marking as L1;
accumulating the diagnosis and treatment amounts of the target patients to form a total diagnosis and treatment amount, and marking the total diagnosis and treatment amount as L2;
AA 4: obtaining a check value WX of a predicted word segmentation result by using a formula WX of WA × d1+ L1 × d2+ L2 × d3, wherein d1, d2 and d3 are all preset coefficients;
AA 5: when the check value WX meets the condition that WX is more than 0 and less than or equal to K1, screening INT (f multiplied by WX) general practitioners as selected doctors; when the maintenance value WX meets the condition that K1 is less than WX, INT [ (1+ f) xWX ] general practitioners are screened out to be selected doctors, wherein INT (f x WX) represents the maximum integer not exceeding f x WX; INT [ (1+ f). times.WX ] represents the largest integer not exceeding (1+ f). times.WX; f is a predetermined coefficient and f > 0.
The invention has the beneficial effects that:
1. the medical text to be recognized is collected and stored, and workers on duty at the current time are obtained and marked as the primary selection workers; calculating the time difference between the time of entry of the primary election and the current time of the system to obtain the time of entry of the primary election, and setting the age and the collection times of the primary election; acquiring an acquisition value SZ of the primary selection personnel by combining a correlation algorithm, and selecting the primary selection personnel with the maximum acquisition value SZ as the acquisition personnel to improve the acquisition efficiency;
2. the invention analyzes the collected medical text; acquiring retrieval records of medical texts within thirty days before the current time of a system, acquiring retrieval attraction values by combining retrieval frequency, retrieval number and retrieval total duration, calculating to obtain delay duration and buffer duration of the medical texts, acquiring optimal identification values QT of the medical texts by using a formula YS (YT multiplied by Z4+ QT multiplied by Z5+1/HT1 multiplied by Z6, performing descending arrangement on the medical texts according to the size of the optimal identification values QT to generate an identification priority table of the medical texts, and feeding back sequence positions of the medical texts in the identification priority table to a control center; the control center identifies the medical text according to the fed-back sequence position; the medical text recognition is orderly carried out, and the recognition efficiency is improved;
3. the invention carries out iterative self-learning through a semi-supervised learning process, and filters and calibrates the predicted word segmentation result; selecting corresponding general practitioners to manually check, check leakage and repair defects; obtaining a final medical named entity recognition result of the medical text; acquiring general practitioners in an idle state at the current time and marking the general practitioners as primary-selected doctors; acquiring personal information of a primary selected doctor to obtain a corresponding academic preset value, a corresponding duration and a corresponding medical duration, obtaining a qualification coefficient of the primary selected doctor by using a formula FA (Xc × c1+ XT × c2+ Xd × c3, and marking the primary selected doctor as a preferred doctor if the qualification coefficient FA of the primary selected doctor is greater than a set qualification coefficient threshold; acquiring historical clinic data of a preferred doctor within preset time; the method has the advantages that the pushing value of the optimal doctor is obtained by combining the evaluation mean value Qs, the number of times of seeing a doctor and the preset value corresponding to the hospital grade, the corresponding general doctors can be reasonably selected according to the pushing value WS of the optimal doctor to carry out manual checking, checking efficiency is improved, the number of the general doctors to carry out manual checking is determined according to the checking value of the predicted word segmentation result, labor cost is effectively reduced, and checking accuracy is improved.
Drawings
In order to facilitate understanding for those skilled in the art, the present invention will be further described with reference to the accompanying drawings.
FIG. 1 is a block diagram of the system of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As shown in fig. 1, a medical entity identification method includes the following steps:
the method comprises the following steps: acquiring and storing a medical text to be identified; the method comprises the following steps:
s11: acquiring workers working at the current time and marking the workers as primary selection workers;
s12: calculating the time difference between the time of entry of the primary election personnel and the current time of the system to obtain the time of entry of the primary election personnel and marking the time as SD;
setting the age of the primary selected person as SF; setting the acquisition times of the primary selection personnel as SG;
s13: normalizing the enrollment duration, the age and the recording times and taking the values;
acquiring an acquisition value SZ of the primary selector by using a formula SZ (SD multiplied by A1+ SG multiplied by A2- | SF-35 |. times A3) × ST-1.2356; wherein ST is the collecting value of the primary selection personnel; a1, A2 and A3 are all preset coefficient factors; for example, a1 takes a value of 0.87, a2 takes a value of 0.35, and A3 takes a value of 0.56;
s14: selecting the primary selecting person with the maximum acquisition value SZ as an acquirer;
s15: sending the acquisition instruction to a mobile phone terminal of an acquirer; meanwhile, the collection times of the collector are increased once;
s16: after receiving the acquisition instruction, the acquirer acquires and stores the medical text to be identified;
calculating the time difference between the acquisition ending time and the acquisition starting time to obtain the acquisition duration of the acquirer, and marking the acquisition duration as TA; setting the score value input by a user as B;
carrying out normalization processing on the acquisition duration and the input score value and taking the value;
acquiring a single value of an acquirer by using a formula DT of 1/TA × B1+ B × B2, summing all the single values of the acquirer and averaging to obtain an acquiring value ST of the acquirer; wherein b1 and b2 are both preset proportionality coefficients; for example, b1 takes the value of 0.658, b2 takes the value of 0.345;
step two: analyzing the acquired medical texts to obtain an identification priority list of the medical texts; feeding back the sequence position of the medical text in the identification priority table to a control center; the control center identifies the medical text according to the fed-back sequence position; the method specifically comprises the following steps:
s21: acquiring the generation time of the medical text, calculating the time difference between the generation time and the current time of the system to obtain the delay time of the medical text, and marking the delay time as YT;
s22: collecting retrieval records of medical texts within thirty days before the current time of the system; the retrieval record comprises retrieval persons, retrieval starting time and retrieval ending time;
accumulating the number of the retrieval times of the medical texts to form retrieval frequency, and marking as P1;
sequencing the retrieval records of the medical texts according to the retrieval people, counting the number of the retrieval people of the medical texts and marking as P2;
calculating the time difference between the reading starting time and the reading ending time to obtain the reading time length, accumulating the reading time length to form the total reading time length, and marking the total reading time length as P3;
carrying out weight distribution on the number of the people who have read and called and the total time length of the people who have read and called, wherein the weight of the number of the people who have read and called is Z1, the weight of the number of the people who have read and called is Z2, and the weight of the total time length of the people who have read and called is Z3; wherein Z1+ Z2+ Z3 is 1; for example, Z1 takes the value of 0.6, a2 takes the value of 0.3, and A3 takes the value of 0.1;
obtaining a retrieval attraction value QT of the medical text by using a formula QT ═ P1 xZ 1+ P2 xZ 2+ P3 xZ 3;
s23: sequencing the retrieval records of the medical texts according to retrieval starting time, acquiring the retrieval starting time of the medical texts at the last time and marking as ZT 1;
calculating the time difference between the last retrieval starting time of the medical text and the current time of the system to obtain a buffer duration and marking the buffer duration as HT 1;
s24: normalizing the delay time, the retrieval attraction value and the buffer time and taking the values of the delay time, the retrieval attraction value and the buffer time;
acquiring a good recognition value QT of the medical text by using a formula YS (YT) multiplied by Z4+ QT multiplied by Z5+1/HT1 multiplied by Z6; wherein Z4, Z5 and Z6 are all preset proportionality coefficients; for example, Z4 takes the value 0.346, Z5 takes the value 0.573, and Z6 takes the value 0.517;
s25: the medical texts are arranged in a descending order according to the size of the optimal identification value QT to generate an identification priority table of the medical texts;
step three: collecting medical dictionaries, and sorting the medical dictionaries into a disease word bank, a symptom word bank, a checking word bank and a treatment word bank; marking the medical text by using the collected medical dictionary through a hidden Markov model to obtain a predicted word segmentation result;
step four: performing iterative self-learning through a semi-supervised learning process, and filtering and calibrating the predicted word segmentation result; selecting corresponding general practitioners to manually check, check leakage and repair defects; obtaining a final medical named entity recognition result of the medical text;
the specific steps of selecting the corresponding general practitioner to carry out manual checking, leakage checking and defect repairing in the fourth step are as follows:
s41: acquiring general practitioners in an idle state at the current time and marking the general practitioners as primary-selected doctors;
s42: acquiring personal information of a primary doctor, wherein the personal information comprises name, gender, academic calendar, doctor qualification information, belonging hospital, medical time and hospital grade; the physician qualification information comprises the acquisition time of the physician qualification;
s43: acquiring the academic information of the primarily selected doctor, and dividing the academic information into four grades of special subject, master and doctor; setting each grade to correspond to a academic preset value, and assigning a special grade e, a subject grade f, a master grade g and a doctor grade h; wherein e, f, g and h are fixed numerical values, and e is more than f and less than g and less than h;
matching the academic calendar information of the primarily selected doctor with all the academic calendar grades to obtain a corresponding academic calendar preset value and marking the preset value as Xc;
calculating the time difference between the doctor qualification acquiring time of the primary selected doctor and the current time of the system to obtain the duration of the certificate holding, and marking the duration as XT;
calculating the time difference between the slave medical time of the initially selected doctor and the current time of the system to obtain the slave medical time length, and marking the slave medical time length as Xd;
obtaining qualification coefficient FA of the primary doctor by using the formula FA as Xc × c1+ XT × c2+ Xd × c 3; wherein c1, c2 and c3 are all preset coefficients; for example, c1 takes the value of 0.32, c2 takes the value of 0.52, and c2 takes the value of 0.45;
s44: comparing the qualification coefficient FA of the initially selected doctor with a set qualification coefficient threshold; if the qualification coefficient FA of the initially selected doctor is larger than the set qualification coefficient threshold, marking the initially selected doctor as a preferred doctor;
s45: acquiring historical clinic data of a preferred doctor within preset time; the historical clinic data comprises clinic times and user evaluation coefficients; the rule of the user evaluation coefficient is as follows: scoring the diagnosis and treatment of the doctor, wherein the score is 100; marking the user evaluation coefficient as Qx, summing the user evaluation coefficients Qx, and then averaging to obtain an evaluation average value Qs; marking the number of times of the diagnosis as Cs;
setting hospital grades of all hospitals to correspond to a preset value, matching the hospital grade of the hospital of the preferred doctor with all the hospital grades to obtain the corresponding preset value, and marking the preset value as DS;
setting the number of times of manual checking of a preferred doctor as CT;
obtaining a pushing value WS of a preferable doctor by using a formula of FA × c4+ Qs × c5+ Cs × c6+ DS × c7+ CT × c 8; wherein c4, c5, c6, c7 and c8 are all preset proportionality coefficients; for example, c4 takes the value of 0.22, c5 takes the value of 0.35, c6 takes the value of 0.45, c7 takes the value of 0.24, and c8 takes the value of 0.64;
s46: sorting the preferred doctors according to the pushing value WS of the preferred doctors from high to low;
s47: screening out a preset number of general doctors as selected doctors according to the sequence of the preferred doctors; carrying out manual checking, omission checking and filling in the predicted word segmentation result;
in step S47, screening out a preset number of general practitioners as selected physicians according to the ranking of the preferred physicians, specifically including:
AA 1: acquiring the text size of a predicted word segmentation result, and marking the text size as WA;
AA 2: acquiring a medical text corresponding to a predicted word segmentation result, and marking a patient corresponding to the medical text as a target patient;
AA 3: acquiring diagnosis and treatment records of a target patient within preset time;
accumulating the diagnosis and treatment times of the target patient to form diagnosis and treatment frequency, and marking as L1;
accumulating the diagnosis and treatment amounts of the target patients to form a total diagnosis and treatment amount, and marking the total diagnosis and treatment amount as L2;
AA 4: obtaining a check value WX of a predicted word segmentation result by using a formula WX of WA × d1+ L1 × d2+ L2 × d3, wherein d1, d2 and d3 are all preset coefficients; for example, d1 takes the value of 0.18, d2 takes the value of 0.56, and d3 takes the value of 0.49;
AA 5: when the check value WX meets the condition that WX is more than 0 and less than or equal to K1, screening INT (f multiplied by WX) general practitioners as selected doctors; when the maintenance value WX meets the condition that K1 is less than WX, INT [ (1+ f) xWX ] general practitioners are screened out to be selected doctors, wherein INT (f x WX) represents the maximum integer not exceeding f x WX; INT [ (1+ f). times.WX ] represents the largest integer not exceeding (1+ f). times.WX; f is a predetermined coefficient and f > 0.
The working principle of the invention is as follows:
a medical entity recognition method comprises the steps that during work, medical texts to be recognized are collected and stored, workers on duty at the current time are obtained and marked as primary selection workers; calculating the time difference between the time of entry of the primary election and the current time of the system to obtain the time of entry of the primary election, and setting the age and the collection times of the primary election; acquiring an acquisition value SZ of the primary selection personnel by combining a correlation algorithm, and selecting the primary selection personnel with the maximum acquisition value SZ as the acquisition personnel to improve the acquisition efficiency; then analyzing the collected medical texts; acquiring retrieval records of medical texts within thirty days before the current time of a system, acquiring retrieval attraction values by combining retrieval frequency, retrieval number and retrieval total duration, calculating to obtain delay duration and buffer duration of the medical texts, acquiring optimal identification values QT of the medical texts by using a formula YS (YT multiplied by Z4+ QT multiplied by Z5+1/HT1 multiplied by Z6, performing descending arrangement on the medical texts according to the size of the optimal identification values QT to generate an identification priority table of the medical texts, and feeding back sequence positions of the medical texts in the identification priority table to a control center; the control center identifies the medical text according to the fed-back sequence position; the medical text recognition is orderly carried out, and the recognition efficiency is improved;
collecting medical dictionaries, and sorting the medical dictionaries into a disease word bank, a symptom word bank, a checking word bank and a treatment word bank; marking the medical text by using the collected medical dictionary through a hidden Markov model to obtain a predicted word segmentation result; performing iterative self-learning through a semi-supervised learning process, and filtering and calibrating the predicted word segmentation result; selecting corresponding general practitioners to manually check, check leakage and repair defects; the final medical named entity recognition result of the medical text is obtained, corresponding general doctors can be reasonably selected according to the push value WS of the optimal doctor to carry out manual checking, checking efficiency is improved, the number of the general doctors carrying out manual checking is determined according to the checking value WS of the predicted word segmentation result, labor cost is effectively reduced, and checking accuracy is improved.
The above formulas are all obtained by collecting a large amount of data to perform software simulation and performing parameter setting processing by corresponding experts, and the formulas are in accordance with real results.
The preferred embodiments of the invention disclosed above are intended to be illustrative only. The preferred embodiments are not intended to be exhaustive or to limit the invention to the precise forms disclosed. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, to thereby enable others skilled in the art to best utilize the invention. The invention is limited only by the claims and their full scope and equivalents.

Claims (5)

1. A medical entity identification method, comprising the steps of:
the method comprises the following steps: acquiring and storing a medical text to be identified;
step two: analyzing the acquired medical texts to obtain an identification priority list of the medical texts; feeding back the sequence position of the medical text in the identification priority table to a control center; the control center identifies the medical text according to the fed-back sequence position;
step three: collecting medical dictionaries, and sorting the medical dictionaries into a disease word bank, a symptom word bank, a checking word bank and a treatment word bank; marking the medical text by using the collected medical dictionary through a hidden Markov model to obtain a predicted word segmentation result;
step four: performing iterative self-learning through a semi-supervised learning process, and filtering and calibrating the predicted word segmentation result; selecting corresponding general practitioners to manually check, check leakage and repair defects; and obtaining a final medical named entity recognition result of the medical text.
2. The method for identifying a medical entity according to claim 1, wherein the step one of collecting and storing the medical text to be identified specifically comprises:
s11: acquiring workers working at the current time and marking the workers as primary selection workers;
s12: calculating the time difference between the time of entry of the primary election personnel and the current time of the system to obtain the time of entry of the primary election personnel and marking the time as SD;
setting the age of the primary selected person as SF; setting the acquisition times of the primary selection personnel as SG;
s13: normalizing the enrollment duration, the age and the recording times and taking the values;
acquiring an acquisition value SZ of the primary selector by using a formula SZ (SD multiplied by A1+ SG multiplied by A2- | SF-35 |. times A3) × ST-1.2356; wherein ST is the collecting value of the primary selection personnel; a1, A2 and A3 are all preset coefficient factors;
s14: selecting the primary selecting person with the maximum acquisition value SZ as an acquirer;
s15: sending the acquisition instruction to a mobile phone terminal of an acquirer; meanwhile, the collection times of the collector are increased once;
s16: after receiving the acquisition instruction, the acquirer acquires and stores the medical text to be identified;
calculating the time difference between the acquisition ending time and the acquisition starting time to obtain the acquisition duration of the acquirer, and marking the acquisition duration as TA; setting the score value input by a user as B;
carrying out normalization processing on the acquisition duration and the input score value and taking the value;
acquiring a single value of an acquirer by using a formula DT of 1/TA × B1+ B × B2, summing all the single values of the acquirer and averaging to obtain an acquiring value ST of the acquirer; wherein b1 and b2 are both preset proportionality coefficients.
3. The method for identifying medical entities according to claim 1, wherein the step two of obtaining the identification priority list of the medical texts comprises the following specific steps:
s21: acquiring the generation time of the medical text, calculating the time difference between the generation time and the current time of the system to obtain the delay time of the medical text, and marking the delay time as YT;
s22: collecting retrieval records of medical texts within thirty days before the current time of the system; the retrieval record comprises retrieval persons, retrieval starting time and retrieval ending time;
accumulating the number of the retrieval times of the medical texts to form retrieval frequency, and marking as P1;
sequencing the retrieval records of the medical texts according to the retrieval people, counting the number of the retrieval people of the medical texts and marking as P2;
calculating the time difference between the reading starting time and the reading ending time to obtain the reading time length, accumulating the reading time length to form the total reading time length, and marking the total reading time length as P3;
carrying out weight distribution on the number of the people who have read and called and the total time length of the people who have read and called, wherein the weight of the number of the people who have read and called is Z1, the weight of the number of the people who have read and called is Z2, and the weight of the total time length of the people who have read and called is Z3; wherein Z1+ Z2+ Z3 is 1;
obtaining a retrieval attraction value QT of the medical text by using a formula QT ═ P1 xZ 1+ P2 xZ 2+ P3 xZ 3;
s23: sequencing the retrieval records of the medical texts according to retrieval starting time, acquiring the retrieval starting time of the medical texts at the last time and marking as ZT 1;
calculating the time difference between the last retrieval starting time of the medical text and the current time of the system to obtain a buffer duration and marking the buffer duration as HT 1;
s24: normalizing the delay time, the retrieval attraction value and the buffer time and taking the values of the delay time, the retrieval attraction value and the buffer time;
acquiring a good recognition value QT of the medical text by using a formula YS (YT) multiplied by Z4+ QT multiplied by Z5+1/HT1 multiplied by Z6; wherein Z4, Z5 and Z6 are all preset proportionality coefficients;
s25: and (5) arranging the medical texts in a descending order according to the size of the optimal recognition value QT to generate a recognition priority table of the medical texts.
4. The method according to claim 1, wherein the step four comprises selecting corresponding general practitioners to manually check, and missing and filling;
s41: acquiring general practitioners in an idle state at the current time and marking the general practitioners as primary-selected doctors;
s42: acquiring personal information of a primary doctor, wherein the personal information comprises name, gender, academic calendar, doctor qualification information, belonging hospital, medical time and hospital grade; the physician qualification information comprises the acquisition time of the physician qualification;
s43: acquiring the academic information of the primarily selected doctor, and dividing the academic information into four grades of special subject, master and doctor; setting each grade to correspond to a academic preset value, and assigning a special grade e, a subject grade f, a master grade g and a doctor grade h; wherein e, f, g and h are fixed numerical values, and e is more than f and less than g and less than h;
matching the academic calendar information of the primarily selected doctor with all the academic calendar grades to obtain a corresponding academic calendar preset value and marking the preset value as Xc;
calculating the time difference between the doctor qualification acquiring time of the primary selected doctor and the current time of the system to obtain the duration of the certificate holding, and marking the duration as XT;
calculating the time difference between the slave medical time of the initially selected doctor and the current time of the system to obtain the slave medical time length, and marking the slave medical time length as Xd;
obtaining qualification coefficient FA of the primary doctor by using the formula FA as Xc × c1+ XT × c2+ Xd × c 3; wherein c1, c2 and c3 are all preset coefficients;
s44: comparing the qualification coefficient FA of the initially selected doctor with a set qualification coefficient threshold; if the qualification coefficient FA of the initially selected doctor is larger than the set qualification coefficient threshold, marking the initially selected doctor as a preferred doctor;
s45: acquiring historical clinic data of a preferred doctor within preset time; the historical clinic data comprises clinic times and user evaluation coefficients; the rule of the user evaluation coefficient is as follows: scoring the diagnosis and treatment of the doctor, wherein the score is 100; marking the user evaluation coefficient as Qx, summing the user evaluation coefficients Qx, and then averaging to obtain an evaluation average value Qs; marking the number of times of the diagnosis as Cs;
setting hospital grades of all hospitals to correspond to a preset value, matching the hospital grade of the hospital of the preferred doctor with all the hospital grades to obtain the corresponding preset value, and marking the preset value as DS;
setting the number of times of manual checking of a preferred doctor as CT;
obtaining a pushing value WS of a preferable doctor by using a formula of FA × c4+ Qs × c5+ Cs × c6+ DS × c7+ CT × c 8; wherein c4, c5, c6, c7 and c8 are all preset proportionality coefficients;
s46: sorting the preferred doctors according to the pushing value WS of the preferred doctors from high to low;
s47: screening out a preset number of general doctors as selected doctors according to the sequence of the preferred doctors; and (4) selecting a doctor to manually check the predicted word segmentation result, and checking for missing and filling up the missing.
5. The method of claim 4, wherein the step S47 of screening out a predetermined number of general practitioners as the selected physicians according to the order of the preferred physicians comprises:
AA 1: acquiring the text size of a predicted word segmentation result, and marking the text size as WA;
AA 2: acquiring a medical text corresponding to a predicted word segmentation result, and marking a patient corresponding to the medical text as a target patient;
AA 3: acquiring diagnosis and treatment records of a target patient within preset time;
accumulating the diagnosis and treatment times of the target patient to form diagnosis and treatment frequency, and marking as L1;
accumulating the diagnosis and treatment amounts of the target patients to form a total diagnosis and treatment amount, and marking the total diagnosis and treatment amount as L2;
AA 4: obtaining a check value WX of a predicted word segmentation result by using a formula WX of WA × d1+ L1 × d2+ L2 × d3, wherein d1, d2 and d3 are all preset coefficients;
AA 5: when the check value WX meets the condition that WX is more than 0 and less than or equal to K1, screening INT (f multiplied by WX) general practitioners as selected doctors; when the maintenance value WX meets the condition that K1 is less than WX, INT [ (1+ f) xWX ] general practitioners are screened out to be selected doctors, wherein INT (f x WX) represents the maximum integer not exceeding f x WX; INT [ (1+ f). times.WX ] represents the largest integer not exceeding (1+ f). times.WX; f is a predetermined coefficient and f > 0.
CN202110378224.6A 2021-04-08 2021-04-08 Medical entity identification method Pending CN112966515A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110378224.6A CN112966515A (en) 2021-04-08 2021-04-08 Medical entity identification method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110378224.6A CN112966515A (en) 2021-04-08 2021-04-08 Medical entity identification method

Publications (1)

Publication Number Publication Date
CN112966515A true CN112966515A (en) 2021-06-15

Family

ID=76279840

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110378224.6A Pending CN112966515A (en) 2021-04-08 2021-04-08 Medical entity identification method

Country Status (1)

Country Link
CN (1) CN112966515A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117035368A (en) * 2023-10-07 2023-11-10 四川桃子健康科技股份有限公司 Doctor dispatching method based on Internet

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117035368A (en) * 2023-10-07 2023-11-10 四川桃子健康科技股份有限公司 Doctor dispatching method based on Internet
CN117035368B (en) * 2023-10-07 2024-01-26 四川桃子健康科技股份有限公司 Doctor dispatching method based on Internet

Similar Documents

Publication Publication Date Title
CN107731269B (en) Disease coding method and system based on original diagnosis data and medical record file data
CN107705839B (en) Disease automatic coding method and system
US20200315518A1 (en) Apparatus for processing data for predicting dementia through machine learning, method thereof, and recording medium storing the same
US7949550B2 (en) Automated processing of medical data for disability rating determinations
US8589420B2 (en) Medical information system and program for same
CN113345577B (en) Diagnosis and treatment auxiliary information generation method, model training method, device, equipment and storage medium
CN111899866B (en) Surgical operation complication evaluation system based on deep learning
CN111584021A (en) Medical record information verification method and device, electronic equipment and storage medium
CN111191415A (en) Operation classification coding method based on original operation data
CN111180026A (en) Special diagnosis and treatment view system and method
CN105956412A (en) System and method for realizing coronary heart disease clinical data collection based on intelligent image-text identification
CN112967803A (en) Early mortality prediction method and system for emergency patients based on integrated model
CN115185936B (en) Medical clinical data quality analysis system based on big data
CN111524570B (en) Ultrasonic follow-up patient screening method based on machine learning
CN116189866A (en) Remote medical care analysis system based on data analysis
CN112966515A (en) Medical entity identification method
CN110610766A (en) Apparatus and storage medium for deriving probability of disease based on symptom feature weight
CN112154512B (en) Systems and methods for prioritization and presentation of heterogeneous medical data
CN115862897A (en) Syndrome monitoring method and system based on clinical data
Tsumoto et al. Estimation of disease code from electronic patient records
CN113140323B (en) Health portrait generation method, system, medium and server
CN115881259A (en) Medical record data processing method, device, equipment and storage medium
CN113140315B (en) Health self-testing system, server and health detection system
Gomula et al. A preliminary attempt to rules generation for mental disorders
CN116978507B (en) Medical information prepositive acquisition system based on big data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination