CN108021553A - Word treatment method, device and the computer equipment of disease term - Google Patents

Word treatment method, device and the computer equipment of disease term Download PDF

Info

Publication number
CN108021553A
CN108021553A CN201711107945.3A CN201711107945A CN108021553A CN 108021553 A CN108021553 A CN 108021553A CN 201711107945 A CN201711107945 A CN 201711107945A CN 108021553 A CN108021553 A CN 108021553A
Authority
CN
China
Prior art keywords
disease
term
candidate
name
similarity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711107945.3A
Other languages
Chinese (zh)
Inventor
不公告发明人
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Yi Yi Intelligent Technology Co Ltd
Original Assignee
Beijing Yi Yi Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Yi Yi Intelligent Technology Co Ltd filed Critical Beijing Yi Yi Intelligent Technology Co Ltd
Publication of CN108021553A publication Critical patent/CN108021553A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The present invention relates to a kind of word treatment method of disease term, including:The pending disease name of cutting, obtains multiple disease participles;The multiple disease participle is matched in standard disease corpus, obtains candidate disease term set;The similarity of each candidate disease term and disease name is obtained, and the candidate disease term in candidate disease term set is ranked up according to similarity;Select in candidate disease term set, rank the candidate disease term in forefront, the word as the disease name handles disease term.The word treatment method of above-mentioned disease name, can carry out automation specification to disease name.The invention further relates to the word processing unit and equipment of a kind of disease term.

Description

Word treatment method, device and the computer equipment of disease term
Technical field
The present invention relates to medical field, is set more particularly to a kind of word treatment method, device and the computer of disease term It is standby.
Background technology
At present, with the development of medical technology, computer technology, document relevant with disease and data are more and more, face To these data, it is necessary to be distinguished according to different diseases to these data, for quickly inquiry and diagnosis and treatment data Word processing management.
International Classification of Diseases (International Classification of disease, ICD) is according to disease Feature, disease is classified, and gives disease criterion title, and represent the system of disease with the method for coding.In order to carry out Morbidity statistics, correlative study and international exchange, the setting up of the system wish that doctor can be with the disease information of typing patient The disease name of typing standard.
But when being actually typing, due to working doctor is busy and study background difference, can largely using write a Chinese character in simplified form, The disease term lack of standardization such as abbreviation, English, write the two or more syllables of a word together carrys out Rapid input disease, and the disease name comprising wrong word also occurs once in a while Claim, such as in typing disease, use " chronic obstructive pulmonary disease " rather than " Chronic Obstructive Pulmonary Disease ", it is difficult to which which kind of disease automatically identifies is Disease, is unfavorable for statistics and the research of disease.How specification handles are carried out to these disease terms lack of standardization, in order to follow-up disease Research, become urgent problem to be solved.
The content of the invention
Based on this, it is necessary to provide a kind of word treatment method, device and the computer equipment of disease term.
A kind of word treatment method of disease term, wherein, the described method includes:
The pending disease name of cutting, obtains multiple disease participles;
The multiple disease participle is matched in standard disease corpus, obtains candidate disease term set;
Obtain each candidate disease term in the candidate disease term set and the pending disease name it Between similarity, and the candidate disease term in candidate disease term set is ranked up according to similarity;
Select in candidate disease term set, the forward candidate disease term that sorts of predetermined number, as the disease The standardization disease term of title.
It is described to be matched the multiple disease participle in standard disease corpus as one of embodiment, The step of obtaining candidate disease term set includes:
Obtain the initial character string of the initial composition of each character in the pending disease name;
In standard disease corpus, the set conduct with the standard disease name of the initial string matching is obtained Primary election disease term set.
It is described to be matched the multiple disease participle in standard disease corpus as one of embodiment, The step of obtaining candidate disease term set further includes:
Obtain the location information in the multiple disease participle;
According to the location information, screened in the primary election disease term set, acquisition is matched with the position Primary election disease term as final election disease term set.
It is described to be matched the multiple disease participle in standard disease corpus as one of embodiment, The step of obtaining candidate disease term set further includes:
Obtain the disease core word in the multiple disease participle;
The disease core word is screened in the final election disease term set, is obtained and the disease core word Matched final election disease term, as the candidate disease term.
As one of embodiment, the similarity for obtaining multiple candidate disease terms and disease name, and according to The step of similarity is ranked up the candidate disease term in the candidate disease terminology includes:
Obtain vector space cosine similarity, the editing distance similarity of the disease name and each candidate disease term And character registration;
The vector space cosine similarity, editing distance similarity and character registration are weighted, obtained To the comprehensive similarity of the disease name and each candidate disease term;
According in the vector space cosine similarity, editing distance similarity, character registration, comprehensive similarity extremely Few one kind is ranked up the multiple candidate disease term.
As one of embodiment, disease name described in the cutting, obtaining the step of multiple diseases segment includes:
The disease name is segmented each disease participle classification in database with disease to be matched, is obtained described Multiple disease participles;
Wherein, the disease participle classification is included in the cause of disease, position, description, core word, side information and disease attribute It is at least one.
As one of embodiment, the pending disease name of the cutting, the step of obtaining multiple diseases participle it Before further include:
The disease name is changed into line character, makes the character attibute in the disease name identical;
Wherein, the character conversion is included in languages conversion, synonym replacement, double byte character and the conversion of half-angle character extremely Few one kind.
It is described to be matched the multiple disease participle in standard disease corpus as one of embodiment, Further included before the step of obtaining multiple candidate disease terms:
Acquisition standard disease language material;
The standard disease language material is segmented, establishes storehouse, as standard disease corpus;
The participle storehouse includes at least one in cause of disease storehouse, position storehouse, pathology storehouse, clinical manifestation storehouse and disease core word Kind.
The word treatment method of above-mentioned disease name, can by cutting disease name and based on standard disease corpus Automation specification is carried out to disease name, and there is very high identification accurate rate for the disease name after word processing.
A kind of word processing unit of disease term, wherein, the word processing unit of the disease term includes:
Cutting module is segmented, the disease name pending for cutting, obtains multiple disease participles;
Candidate terms screening module, for the multiple disease participle to be matched in standard disease corpus, obtains To multiple candidate disease terms;
Candidate terms sorting module, for obtaining the similarity of multiple candidate disease terms and disease name, and according to phase Multiple candidate disease terms are ranked up like degree;
Candidate terms processing module, for selecting in multiple candidate disease terms, ranks the candidate disease term in forefront, makees For the disease term of the disease name.
A kind of computer equipment, the computer equipment include processor, the meter of memory and storage on a memory Calculation machine instructs, wherein, the computer instruction realizes any of the above-described embodiment the method when being performed by the processor Step.
The word processing unit and computer equipment of above-mentioned disease name, can be used in cutting disease participle, and be based on standard Disease corpus carries out disease name automation specification, and has very high identification essence for the disease name after word processing True rate.
Brief description of the drawings
Fig. 1 is the flow chart for the disease term word treatment method that one embodiment provides;
Fig. 2 is the flow chart for the candidate disease term set acquisition methods that one embodiment provides;
Fig. 3 is the flow chart for the method being ranked up according to similarity to candidate disease term that one embodiment provides;
Fig. 4 is the flow chart for the method for establishing standard disease corpus that one embodiment provides;
Fig. 5 is the flow chart of the word treatment method for the disease term that another embodiment provides;
Fig. 6 is the structure diagram of the word processing unit for the disease term that one embodiment provides.
Embodiment
In order to make the purpose , technical scheme and advantage of the present invention be clearer, it is right with reference to the accompanying drawings and embodiments The present invention is further elaborated.It should be appreciated that specific embodiment described herein is only to explain the present invention, not For limiting the present invention.
Referring to Fig. 1, one embodiment of the invention provides a kind of word treatment method of disease term, the described method includes:
Step S110, the pending disease name of cutting, obtains multiple disease participles.
Specifically, for the medical text of input, such as electronic health record, medical text books, paper, by medical text Pending disease name carries out cutting, obtains multiple disease participles.Meanwhile participle instrument can be flexibly selected, using JIEBA The participle instrument such as participle carries out cutting to medical text, medical text can also be cut using special medicine corpus Point, obtain multiple participles.
Step S120, the multiple disease participle is matched in standard disease corpus, obtains multiple candidate's diseases Sick term.
Standard disease corpus is the database for being stored with standard disease name, and specification is carried out available for disease name. By being matched, pending disease name so as to obtain with the standard disease name in standard disease corpus With the pending matched candidate disease term of disease name.
Step S130, obtains the similarity of each candidate disease term and the disease name, and according to similarity to more A candidate disease term is ranked up.
After multiple candidate disease terms are obtained, each candidate disease term and pending disease name can be calculated Between similarity.In addition, after the similarity between each candidate disease term and pending disease name is got, Multiple candidate disease terms can be ranked up according to the size of similarity.
Step S140, selects in multiple candidate disease terms, ranks the candidate disease term in forefront, as the disease name The disease term of title.
After being ranked up to multiple candidate disease terms, the candidate disease term in ranking forefront can be selected, as The disease term of the pending disease name.For example, only selection the first candidate disease term can be arranged, as pending disease The disease term that name of disease claims.In addition, also may be selected row front three candidate disease term, collectively as pending disease name Disease term.
The word treatment method for the disease name that above-described embodiment provides, by cutting and is based on standard disease corpus, energy It is enough that automation specification is carried out to disease name, can flexibly solve a variety of nonstandard word forms, and for processing after Disease name has very high identification accurate rate.
In one of the embodiments, it is described to segment the multiple disease in standard disease language material also referring to Fig. 2 The step of being matched in storehouse, obtaining multiple candidate disease terms includes:
Step S121, obtains the lead-in alphabetic character of the initial composition of each character in the pending disease name String.
For the ease of identification, the initial in pending disease name can be subjected to split, obtain initial character string. Such as " chronic obstructive pulmonary disease ", then the initial character string obtained is " MZF ".
Step S122, in standard disease corpus, obtains the standard disease name with the initial string matching Set is used as primary election disease term set.
Then by the initial character string, matched, obtained and the initial character string in standard disease corpus Matched standard disease name, as primary election disease term., can quickly, accurately by way of initial string matching Ground searches the standard disease name being relatively close together with disease participle, as primary election disease in standard disease corpus Term.It is appreciated that the primary election disease term also can be directly as candidate disease term.
In one of the embodiments, after primary election disease term is obtained by way of initial string matching, Further include:
Step S123, obtains the location information in the multiple disease participle.
In pending disease name, location information may be included in disease participle.The location information is to being taken a disease Sick position is described, such as " lung ", " respiratory tract ", " brain " etc..The location information can be used in relatively precisely positioning trouble The position of person's illnesses.
Step S124, according to the location information, is screened in the primary election disease term, is obtained and the position Matched primary election disease term is as final election disease term set.
Therefore, by standard disease corpus, being matched to location information, if in standard disease corpus In, disease name corresponding with position can be found from primary election disease term, also can just be obtained and pending disease name The primary election disease name set that the disease participle of title is more nearly, as final election disease term set.It is appreciated that the final election disease Sick term set also can be directly as candidate disease term.
Further, if in some primary election disease term of acquisition, disease name corresponding with diagnosis and treatment position is not retrieved Claim, then the primary election disease term can be deleted from candidate disease term, remaining primary election disease term, as final election disease art Language set.
In one of the embodiments, after obtaining the final election disease term, may also include:
Step S125, obtains the disease core word in the multiple disease participle.
Disease core word is the vocabulary of description disease states property, passes through the disease core word, it can be determined that illnesses For which kind of disease, such as " infection ", " tuberculosis ", " measles " etc..Screened by the disease core word in being segmented to disease, It can obtain the corresponding illness property of disease participle.
Step S126, the disease core word is screened in the final election disease term set, obtain with it is described The matched final election disease term set of disease core word, as the candidate disease term set.
By in final election disease term set, being matched again according to disease core word, final election disease term is obtained In, the final election disease term comprising disease core word, as candidate disease term set.
Further, if in final election disease term, do not retrieve disease core word, can also by the final election disease term from Deleted in final election disease term set, to improve the precision of subsequent match and speed.
In one of the embodiments, it is described to obtain multiple candidate disease terms and disease name also referring to Fig. 3 Similarity, and the step of being ranked up according to similarity to the candidate disease term in candidate disease terminology include:
Step S131, obtains vector space cosine similarity, the editor of the disease name and each candidate disease term Distance conformability degree and character registration;
Step S132, adds the vector space cosine similarity, editing distance similarity and character registration Power calculates, and obtains the comprehensive similarity of the disease name and each candidate disease term;
Step S133, according to the vector space cosine similarity, editing distance similarity, character registration, comprehensive phase The candidate disease term in candidate disease terminology is ranked up like at least one of degree.
In step S131, the phase of the disease name and each candidate disease term can be assessed with " word " for unit Like property, vector space cosine (cosin) similarity using word as granularity is calculated;The disease can be assessed with " word " for unit Title and the similitude of each candidate disease term, calculate editing distance (levenshtein) similarity using word as granularity;With " word " is unit, and the angle overlapped from word, calculates the registration using word as granularity.
, can be according to vector space cosine similarity, editing distance similarity, the power of character registration in step S132 Value, is weighted processing, obtains comprehensive similarity.Vector space cosine similarity, editing distance similarity, character registration Weights can be configured according to different be actually needed of efficiency, accuracy.
In step S133, it can be overlapped according to the vector space cosine similarity, editing distance similarity, character At least one of degree or comprehensive similarity are ranked up the multiple candidate disease term.That is, can only root According to the one or more in vector space cosine similarity, editing distance similarity and character registration to candidate disease term It is ranked up, to improve efficiency;Also candidate disease term can be ranked up according to comprehensive similarity, to improve accuracy.
If for example, occur wrong word in disease name, such as " cerebral infarction is trembled with fear ", then trembled with fear by similarity measure and cerebral infarction The candidate disease term matched somebody with somebody includes " cerebral infarction ";And for writing a Chinese character in simplified form, abridging, such as " chronic obstructive pulmonary disease ", then can by similarity mode, Obtaining candidate disease term includes " chronic obstructive disease of lung " etc..
In one of the embodiments, disease name described in the cutting, obtaining the step of multiple diseases segment includes:
The disease name is segmented each disease participle classification in database with disease to be matched, is obtained described Multiple disease participles;Wherein, the disease participle classification includes the cause of disease, position, description, core word, side information and disease category At least one of property.
It is multiple that disease participle database mainly includes the cause of disease, position, description, core word, side information and disease attribute etc. Classification, according to above-mentioned multiple classifications, splits disease name, so as to form multiple disease participles.The cause of disease such as " EB diseases Poison " etc.;Position is patient part;And describe to include pathology and clinical manifestation etc., clinical manifestation for example symptom, sign, Parting, gender, age, acute and chronic, disease time etc. by stages;Side information is additional medical information, such as disease, through bacterium Learn what is confirmed with histology;Disease attribute for example with/not with etc..
In one of the embodiments, the pending disease name of the cutting, before obtaining multiple diseases participle steps Further include:
The disease name is changed into line character, makes the character attibute in the disease name identical;
Wherein, the character conversion is included in languages conversion, synonym replacement, double byte character and the conversion of half-angle character extremely Few one kind.
May be pure English or comprising Chinese and English due in disease name after getting pending disease name, Double byte character, half-angle character may be included, then the character of above-mentioned different attribute can also be changed, make the word in disease name It is all identical to accord with attribute.In addition, also, to pending disease name, the replacement of synonym is carried out using thesaurus, in order to The speed and precision of subsequent match are improved, to improve follow-up resolution.
If for example, including " COPD " in disease name, Chinese conversion is carried out, is obtained " Chronic Obstructive Pulmonary Disease ".And For " L4-5 disc herniations ", Chinese and English conversion can be carried out, similarity mode can also be carried out based on Chinese, that is, Chinese is set The weights of character are higher.
In one of the embodiments, in the progress in standard disease corpus by the multiple disease participle Further included before the step of matching somebody with somebody, obtaining candidate disease term set:
Step S102, obtains standard disease language material.
Step S104, segments the standard disease language material, participle storehouse is established, as standard disease corpus;Institute Stating participle storehouse includes at least one of cause of disease storehouse, position storehouse, pathology storehouse, clinical manifestation storehouse and disease core word.
The combination in above-mentioned participle storehouse, can be used as standard disease corpus.
Based on the cause of disease, position, case history, clinical manifestation and disease core word, 5 exclusive storehouses of disease name are built:Cause of disease storehouse, Position storehouse, pathology storehouse, clinical manifestation storehouse and disease core word.
Further, table 1 is referred to, the composition of ICD disease names is:" disease name+comma+side information " or " disease Title+companion/not with information ".Therefore, ICD disease names can be also split as again to the participle of ICD disease names:Position, disease Because, description (including pathology and clinical manifestation), core word, side information and with/not with, etc., so as to pending Disease name further refines, and also makes the result that word is handled more accurate.
1 ICD disease names of form split example
If it is appreciated that in standard disease corpus, retrieve and matched with the pending disease name unanimously Standard disease term, then can be directly by the standard disease term, the disease term as pending disease name.
In one of the embodiments, please refer to fig. 5, providing a kind of flow of the word treatment method of disease term Figure, including:
Step 1, pending disease name is changed into line character, makes the character attibute in disease name identical;
Step 2.1, the pending disease name of cutting, obtains multiple disease participles;
Step 2.2, the initial character string of the initial composition of each character in the pending disease name is obtained;
Step 4.1, in standard disease corpus, obtain and the standard disease name of the initial string matching Set is used as primary election disease term set;
Step 4.2, according to the diagnosis and treatment location information, screened in primary election disease term set, acquisition and position Matched primary election disease term is as final election disease term set;
Step 4.3, disease core word is screened in the final election disease term, is obtained and the disease core word Matched final election disease term, as the candidate disease term;
Step 5.1, the vector space cosine similarity of the disease name and each candidate disease term is obtained;
Step 5.2, the editing distance similarity of the disease name and each candidate disease term is obtained;
Step 5.3, the character registration of the disease name and each candidate disease term is obtained;
Step 5.4, the vector space cosine similarity, editing distance similarity and character registration are weighted Calculate, obtain the comprehensive similarity of the disease name and each candidate disease term;
Step 5.5, it is similar according to the vector space cosine similarity, editing distance similarity, character registration, synthesis At least one of degree is ranked up the multiple candidate disease term;
Step 6, the candidate disease term in selection ranking forefront, the disease term as the disease name.
In one of the embodiments, the method further includes:
Step 3, if in standard disease corpus, retrieve and match consistent standard with the pending disease name Disease term, then by the standard disease term, the disease term as pending disease name.
Referring to Fig. 6, one embodiment of the invention also provides a kind of word processing unit of disease term, the disease term Word processing unit include:
Cutting module 1002 is segmented, the disease name pending for cutting, obtains multiple disease participles.
For the medical text of input, such as electronic health record, medical text books, paper, participle cutting module 1002 can be to doctor Treat disease name pending in text and carry out cutting, obtain multiple disease participles.Meanwhile participle instrument can be flexibly selected, Cutting is carried out to medical text using participle instruments such as JIEBA participles, special medicine corpus can also be used to medical text This progress cutting, obtains multiple participles.
Candidate terms screening module 1004, for the progress in standard disease corpus by the multiple disease participle Match somebody with somebody, obtain multiple candidate disease terms.
Standard disease corpus is the database for being stored with standard disease name, and specification is carried out available for disease name. Candidate terms screening module 1004 by pending disease name, with standard disease corpus standard disease name carry out Match somebody with somebody, so as to obtain and the matched candidate disease term of pending disease name.
Candidate terms sorting module 1006, for obtaining the similarity of multiple candidate disease terms and disease name, and is pressed Multiple candidate disease terms are ranked up according to similarity.
After multiple candidate disease terms are obtained, candidate terms sorting module 1006 can calculate each candidate disease art Similarity between language and pending disease name.In addition, getting each candidate disease term and pending disease After similarity between title, multiple candidate disease terms can be ranked up according to the size of similarity.
Candidate terms processing module 1008, for selecting in multiple candidate disease terms, ranks the candidate disease art in forefront Language, the disease term as the disease name.
After being ranked up to multiple candidate disease terms, candidate terms processing module 1008 can select ranking forefront Candidate disease term, the disease term as the pending disease name.For example, only selection it can arrange the first candidate's disease Sick term, the disease term as pending disease name.In addition, also may be selected row front three candidate disease term, jointly Disease term as pending disease name.
The word processing unit for the disease name that above-described embodiment provides, obtains disease by cutting and segments and be based on standard disease Sick corpus, can carry out disease name automation specification, and have very high knowledge for the disease name after word processing Other accurate rate.
In one of the embodiments, the candidate terms screening module 1004 includes:
Character string acquiring unit, the head that the initial for obtaining each character in the pending disease name forms Alphabetic character string;
Candidate word primary election unit, in standard disease corpus, obtaining the mark with the initial string matching The set of quasi- disease name is as primary election disease term set.
In one of the embodiments, the candidate terms screening module 1004 further includes:
Position acquiring unit, for obtaining the location information in the multiple disease participle;
Candidate word final election unit, for according to the diagnosis and treatment location information, being carried out in the primary election disease term set Screening, obtains with the matched primary election disease term in the position as final election disease term set.
In one of the embodiments, candidate terms screening module 1004 further includes:
Core word acquiring unit, for obtaining the disease core word in the multiple disease participle;
Candidate terms determination unit, for the disease core word to be screened in the final election disease term, is obtained With the matched final election disease term of the disease core word, as the candidate disease term.
In one of the embodiments, candidate terms sorting module 1006 includes:
Similarity acquiring unit, for obtaining the vector space cosine phase of the disease name and each candidate disease term Like at least one of degree, editing distance similarity and character registration;
Comprehensive similarity acquiring unit, for the vector space cosine similarity, editing distance similarity and word Symbol registration is weighted, and obtains the comprehensive similarity of the disease name and each candidate disease term;
Candidate terms sequencing unit, for according to the vector space cosine similarity, editing distance similarity, character weight At least one of right, comprehensive similarity is ranked up the multiple candidate disease term.
In one of the embodiments, cutting module 1002 is segmented to be additionally operable to:
The disease name is segmented each disease participle classification in database with disease to be matched, is obtained described Multiple disease participles;
Wherein, the disease participle classification is included in the cause of disease, position, description, core word, side information and disease attribute It is at least one.
In one of the embodiments, candidate terms screening module 1004 is additionally operable to:
The disease name is changed into line character, makes the character attibute in the disease name identical;
Wherein, the character conversion is included in languages conversion, synonym replacement, double byte character and the conversion of half-angle character extremely Few one kind.
In one of the embodiments, cutting module 1002 is segmented to be additionally operable to:
Acquisition standard disease language material;
The standard disease language material is segmented, participle storehouse is established, as standard disease corpus;
The participle storehouse includes at least one in cause of disease storehouse, position storehouse, pathology storehouse, clinical manifestation storehouse and disease core word Kind.
In one embodiment of the invention, a kind of computer equipment is also provided, the computer equipment includes processor, storage The computer instruction of device and storage on a memory, the computer instruction realize disease art when being performed by the processor The word treatment method of language, the described method includes:
The pending disease name of cutting, obtains multiple disease participles;
The multiple disease participle is matched in standard disease corpus, obtains candidate disease term set;
The similarity of each candidate disease term and disease name is obtained, and according to similarity to candidate disease term set In candidate disease term be ranked up;
Select in candidate disease term set, rank the candidate disease term in forefront, the disease as the disease name Term.
In one of the embodiments, the described of processor execution segments the multiple disease in standard disease language The step of being matched in material storehouse, obtain candidate disease term set includes:
Obtain the initial character string of the initial composition of each character in the pending disease name;
In standard disease corpus, the set conduct with the standard disease name of the initial string matching is obtained Primary election disease term set.
In one of the embodiments, the mark of the processor performs the acquisition and the initial string matching Further included after the step of quasi- disease name set is as primary election disease term set:
Obtain the location information in the multiple disease participle;
According to the location information, screened in the primary election disease term set, acquisition is matched with the position Primary election disease term as final election disease term set.
In one of the embodiments, the acquisition and the matched primary election disease art in the position that the processor performs The step of language set is as after final election disease term set further includes:
Obtain the disease core word in the multiple disease participle;
The disease core word is screened in the final election disease term, acquisition is matched with the disease core word Final election disease term, as the candidate disease term.
In one of the embodiments, the multiple candidate disease terms of the acquisition and disease name that the processor performs Similarity, and the step of being ranked up according to similarity to the candidate disease term in the candidate disease terminology include:
Obtain vector space cosine similarity, the editing distance similarity of the disease name and each candidate disease term And character registration;
The vector space cosine similarity, editing distance similarity and character registration are weighted, obtained To the comprehensive similarity of the disease name and each candidate disease term;
According in the vector space cosine similarity, editing distance similarity, character registration, comprehensive similarity extremely Few one kind is ranked up the multiple candidate disease term.
In one of the embodiments, disease name described in the cutting that the processor performs, obtains multiple diseases The step of participle, includes:
The disease name is segmented each disease participle classification in database with disease to be matched, is obtained described Multiple disease participles;
Wherein, the disease participle classification is included in the cause of disease, position, description, core word, side information and disease attribute It is at least one.
In one of the embodiments, after the step of processor performs the acquisition pending disease name Further include:
The disease name is changed into line character, makes the character attibute in the disease name identical;
Wherein, the character conversion is included in languages conversion, synonym replacement, double byte character and the conversion of half-angle character extremely Few one kind.
In one of the embodiments, the described of processor execution segments the multiple disease in standard disease language Further included before the step of being matched in material storehouse, obtain multiple candidate disease terms:
Acquisition standard disease language material;
The standard disease language material is segmented, participle storehouse is established, as standard disease corpus;
The participle storehouse includes at least one in cause of disease storehouse, position storehouse, pathology storehouse, clinical manifestation storehouse and disease core word Kind.
The computer equipment that above-described embodiment provides, disease can be obtained by cutting and segments and is based on standard disease language material Storehouse, can carry out disease name automation specification, and accurate with very high identification for the disease name after word processing Rate.
It should be understood by those skilled in the art that, embodiments herein can be provided as method, system or computer program Product.Therefore, the application can use the reality in terms of complete hardware embodiment, complete software embodiment or combination software and hardware Apply the form of example.Moreover, the application can use the computer for wherein including computer usable program code in one or more The computer program production that usable storage medium is implemented on (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) The form of product.
The application is with reference to the flow according to the method for the embodiment of the present application, equipment (system) and computer program product Figure and/or block diagram describe.It should be understood that it can be realized by computer program instructions every first-class in flowchart and/or the block diagram The combination of flow and/or square frame in journey and/or square frame and flowchart and/or the block diagram.These computer programs can be provided The processors of all-purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices is instructed to produce A raw machine so that the instruction performed by computer or the processor of other programmable data processing devices, which produces, to be used in fact The device for the function of being specified in present one flow of flow chart or one square frame of multiple flows and/or block diagram or multiple square frames.
These computer program instructions, which may also be stored in, can guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works so that the instruction being stored in the computer-readable memory, which produces, to be included referring to Make the manufacture of device, the command device realize in one flow of flow chart or multiple flows and/or one square frame of block diagram or The function of being specified in multiple square frames.
These computer program instructions can be also loaded into computer or other programmable data processing devices so that counted Series of operation steps is performed on calculation machine or other programmable devices to produce computer implemented processing, thus in computer or The instruction performed on other programmable devices is provided and is used for realization in one flow of flow chart or multiple flows and/or block diagram one The step of function of being specified in a square frame or multiple square frames.
Each technical characteristic of embodiment described above can be combined arbitrarily, to make description succinct, not to above-mentioned reality Apply all possible combination of each technical characteristic in example to be all described, as long as however, the combination of these technical characteristics is not deposited In contradiction, the scope that this specification is recorded all is considered to be.
Embodiment described above only expresses the several embodiments of the present invention, its description is more specific and detailed, but simultaneously Cannot therefore it be construed as limiting the scope of the patent.It should be pointed out that come for those of ordinary skill in the art Say, without departing from the inventive concept of the premise, various modifications and improvements can be made, these belong to the protection of the present invention Scope.Therefore, the protection domain of patent of the present invention should be determined by the appended claims.

Claims (10)

  1. A kind of 1. word treatment method of disease term, it is characterised in that the described method includes:
    The pending disease name of cutting, obtains multiple disease participles;
    The multiple disease participle is matched in standard disease corpus, obtains candidate disease term set;
    Obtain between each candidate disease term in the candidate disease term set and the pending disease name Similarity, and the candidate disease term in candidate disease term set is ranked up according to similarity;
    Select in candidate disease term set, the forward candidate disease term that sorts of predetermined number, as the disease name Standardization disease term.
  2. 2. according to the method described in claim 1, it is characterized in that, described segment the multiple disease in standard disease language material The step of being matched in storehouse, obtaining candidate disease term set includes:
    Obtain the initial character string of the initial composition of each character in the pending disease name;
    In standard disease corpus, the set with the standard disease name of the initial string matching is obtained as primary election Disease term set.
  3. 3. according to the method described in claim 2, it is characterized in that, described segment the multiple disease in standard disease language material The step of being matched in storehouse, obtaining candidate disease term set further includes:
    Obtain the location information in the multiple disease participle;
    According to the location information, screened, obtained matched just with the position in the primary election disease term set Disease term is selected as final election disease term set.
  4. 4. according to the method described in claim 3, it is characterized in that, described segment the multiple disease in standard disease language material The step of being matched in storehouse, obtaining candidate disease term set further includes:
    Obtain the disease core word in the multiple disease participle;
    The disease core word is screened in the final election disease term set, acquisition is matched with the disease core word Final election disease term, as the candidate disease term.
  5. 5. according to the method described in claim 1, it is characterized in that, multiple candidate disease terms and the disease name of obtaining Similarity, and the step of being ranked up according to similarity to the candidate disease term in the candidate disease terminology include:
    Obtain the vector space cosine similarity of the disease name and each candidate disease term, editing distance similarity and Character registration;
    The vector space cosine similarity, editing distance similarity and character registration are weighted, obtain institute State the comprehensive similarity of disease name and each candidate disease term;
    According at least one in the vector space cosine similarity, editing distance similarity, character registration, comprehensive similarity Kind is ranked up the multiple candidate disease term.
  6. 6. according to the method described in claim 1, it is characterized in that, disease name described in the cutting, obtains multiple diseases point The step of word, includes:
    The disease name is segmented each disease participle classification in database with disease to be matched, is obtained the multiple Disease segments;
    Wherein, the disease participle classification is included in the cause of disease, position, description, core word, side information and disease attribute at least It is a kind of.
  7. 7. according to the method described in claim 1, it is characterized in that, the pending disease name of the cutting, obtains multiple diseases Further included before the step of disease participle:
    The disease name is changed into line character, makes the character attibute in the disease name identical;
    Wherein, the character conversion includes at least one in languages conversion, synonym replacement, double byte character and the conversion of half-angle character Kind.
  8. 8. according to the method described in claim 1, it is characterized in that, described segment the multiple disease in standard disease language material Further included before the step of being matched in storehouse, obtaining multiple candidate disease terms:
    Acquisition standard disease language material;
    The standard disease language material is segmented, establishes storehouse, as standard disease corpus;
    The participle storehouse includes at least one of cause of disease storehouse, position storehouse, pathology storehouse, clinical manifestation storehouse and disease core word.
  9. 9. a kind of word processing unit of disease term, it is characterised in that the word processing unit of the disease term includes:
    Cutting module is segmented, the disease name pending for cutting, obtains multiple disease participles;
    Candidate terms screening module, for the multiple disease participle to be matched in standard disease corpus, obtains more A candidate disease term;
    Candidate terms sorting module, for obtaining the similarity of multiple candidate disease terms and disease name, and according to similarity Multiple candidate disease terms are ranked up;
    Candidate terms processing module, for selecting in multiple candidate disease terms, ranks the candidate disease term in forefront, as institute State the disease term of disease name.
  10. 10. a kind of computer equipment, the computer equipment includes processor, the calculating of memory and storage on a memory Machine instructs, it is characterised in that the computer instruction is realized described in claim any one of 1-8 when being performed by the processor The step of method.
CN201711107945.3A 2017-09-30 2017-11-10 Word treatment method, device and the computer equipment of disease term Pending CN108021553A (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710916227 2017-09-30
CN2017109162274 2017-09-30

Publications (1)

Publication Number Publication Date
CN108021553A true CN108021553A (en) 2018-05-11

Family

ID=62080472

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711107945.3A Pending CN108021553A (en) 2017-09-30 2017-11-10 Word treatment method, device and the computer equipment of disease term

Country Status (1)

Country Link
CN (1) CN108021553A (en)

Cited By (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108920453A (en) * 2018-06-08 2018-11-30 医渡云(北京)技术有限公司 Data processing method, device, electronic equipment and computer-readable medium
CN109582797A (en) * 2018-12-13 2019-04-05 泰康保险集团股份有限公司 Obtain method, apparatus, medium and electronic equipment that classification of diseases is recommended
CN109615533A (en) * 2018-10-24 2019-04-12 平安健康保险股份有限公司 Hospital efficiency analysis method and system
CN109994215A (en) * 2019-04-25 2019-07-09 清华大学 Disease automatic coding system, method, equipment and storage medium
CN110032728A (en) * 2019-02-01 2019-07-19 阿里巴巴集团控股有限公司 The standardized conversion method of disease name and device
CN110851595A (en) * 2019-10-08 2020-02-28 云知声智能科技股份有限公司 Identification method and device for disease term core vocabulary
CN110956043A (en) * 2019-12-17 2020-04-03 人和未来生物科技(长沙)有限公司 Domain professional vocabulary word embedding vector training method, system and medium based on alias standardization
CN111046660A (en) * 2019-11-21 2020-04-21 深圳无域科技技术有限公司 Method and device for recognizing text professional terms
CN111063446A (en) * 2019-12-17 2020-04-24 医渡云(北京)技术有限公司 Method, apparatus, device and storage medium for standardizing medical text data
CN111126055A (en) * 2019-10-28 2020-05-08 国电南瑞科技股份有限公司 Power grid equipment name matching method and system
CN111325032A (en) * 2020-02-21 2020-06-23 中国建设银行股份有限公司 5G + intelligent banking institution name standardization method and device
CN111563139A (en) * 2020-07-15 2020-08-21 平安国际智慧城市科技股份有限公司 Checking method and device for identifying invoice drug name through OCR (optical character recognition) and computer equipment
CN111581976A (en) * 2020-03-27 2020-08-25 平安医疗健康管理股份有限公司 Method and apparatus for standardizing medical terms, computer device and storage medium
CN111666754A (en) * 2020-05-28 2020-09-15 平安医疗健康管理股份有限公司 Entity identification method and system based on electronic disease text and computer equipment
CN111859942A (en) * 2020-07-02 2020-10-30 上海森亿医疗科技有限公司 Medical name normalization method and device, storage medium and terminal
CN111898376A (en) * 2020-07-01 2020-11-06 拉扎斯网络科技(上海)有限公司 Name data processing method and device, storage medium and computer equipment
CN112022140A (en) * 2020-07-03 2020-12-04 上海数创医疗科技有限公司 Automatic diagnosis method and system for diagnosis conclusion of electrocardiogram
CN112149006A (en) * 2019-11-20 2020-12-29 广州市疾病预防控制中心(广州市卫生检验中心) Data display method, device, equipment and storage medium of disease information
CN112163146A (en) * 2019-11-20 2021-01-01 广州市疾病预防控制中心(广州市卫生检验中心) Data processing method, device, equipment and storage medium of disease information
CN112307763A (en) * 2020-12-30 2021-02-02 望海康信(北京)科技股份公司 Term standardization method, system and corresponding equipment and storage medium
CN112507107A (en) * 2019-09-16 2021-03-16 深圳中兴网信科技有限公司 Term matching method, device, terminal and computer-readable storage medium
CN112580360A (en) * 2020-11-11 2021-03-30 上海数创医疗科技有限公司 Electrocardio term semantic matching device
CN112633005A (en) * 2020-11-11 2021-04-09 上海数创医疗科技有限公司 Electrocardio term semantic matching method
CN112687397A (en) * 2020-12-31 2021-04-20 四川大学华西医院 Rare disease knowledge base processing method and device and readable storage medium
CN112992376A (en) * 2021-03-04 2021-06-18 山东大学 Disease name matching method and system based on weight adjustment
CN113077912A (en) * 2021-04-01 2021-07-06 深圳鸿祥源科技有限公司 Medical Internet of things monitoring system and method based on 5G network
CN113128216A (en) * 2019-12-31 2021-07-16 中国移动通信集团贵州有限公司 Language identification method, system and device
CN113722418A (en) * 2021-08-30 2021-11-30 平安科技(深圳)有限公司 Clinical case standardization method, device, equipment and medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050181350A1 (en) * 2004-02-18 2005-08-18 Anuthep Benja-Athon Pattern of medical words and terms
CN101615182A (en) * 2008-06-27 2009-12-30 西门子公司 Tcm symptom information storage system and tcm symptom information storage means
CN105045853A (en) * 2015-07-07 2015-11-11 浪潮通用软件有限公司 Industry data matching method and device
CN105095665A (en) * 2015-08-13 2015-11-25 易保互联医疗信息科技(北京)有限公司 Natural language processing method and system for Chinese disease diagnosis information
CN106383853A (en) * 2016-08-30 2017-02-08 刘勇 Realization method and system for electronic medical record post-structuring and auxiliary diagnosis
CN106649273A (en) * 2016-12-26 2017-05-10 东软集团股份有限公司 Text processing method and text processing device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050181350A1 (en) * 2004-02-18 2005-08-18 Anuthep Benja-Athon Pattern of medical words and terms
CN101615182A (en) * 2008-06-27 2009-12-30 西门子公司 Tcm symptom information storage system and tcm symptom information storage means
CN105045853A (en) * 2015-07-07 2015-11-11 浪潮通用软件有限公司 Industry data matching method and device
CN105095665A (en) * 2015-08-13 2015-11-25 易保互联医疗信息科技(北京)有限公司 Natural language processing method and system for Chinese disease diagnosis information
CN106383853A (en) * 2016-08-30 2017-02-08 刘勇 Realization method and system for electronic medical record post-structuring and auxiliary diagnosis
CN106649273A (en) * 2016-12-26 2017-05-10 东软集团股份有限公司 Text processing method and text processing device

Cited By (37)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108920453B (en) * 2018-06-08 2023-03-24 国家食品药品监督管理总局药品评价中心 Data processing method and device, electronic equipment and computer readable medium
CN108920453A (en) * 2018-06-08 2018-11-30 医渡云(北京)技术有限公司 Data processing method, device, electronic equipment and computer-readable medium
CN109615533A (en) * 2018-10-24 2019-04-12 平安健康保险股份有限公司 Hospital efficiency analysis method and system
CN109582797A (en) * 2018-12-13 2019-04-05 泰康保险集团股份有限公司 Obtain method, apparatus, medium and electronic equipment that classification of diseases is recommended
CN110032728A (en) * 2019-02-01 2019-07-19 阿里巴巴集团控股有限公司 The standardized conversion method of disease name and device
CN109994215A (en) * 2019-04-25 2019-07-09 清华大学 Disease automatic coding system, method, equipment and storage medium
CN112507107A (en) * 2019-09-16 2021-03-16 深圳中兴网信科技有限公司 Term matching method, device, terminal and computer-readable storage medium
CN110851595A (en) * 2019-10-08 2020-02-28 云知声智能科技股份有限公司 Identification method and device for disease term core vocabulary
CN111126055A (en) * 2019-10-28 2020-05-08 国电南瑞科技股份有限公司 Power grid equipment name matching method and system
CN112163146A (en) * 2019-11-20 2021-01-01 广州市疾病预防控制中心(广州市卫生检验中心) Data processing method, device, equipment and storage medium of disease information
CN112149006A (en) * 2019-11-20 2020-12-29 广州市疾病预防控制中心(广州市卫生检验中心) Data display method, device, equipment and storage medium of disease information
CN111046660B (en) * 2019-11-21 2023-05-09 深圳无域科技技术有限公司 Method and device for identifying text professional terms
CN111046660A (en) * 2019-11-21 2020-04-21 深圳无域科技技术有限公司 Method and device for recognizing text professional terms
CN110956043A (en) * 2019-12-17 2020-04-03 人和未来生物科技(长沙)有限公司 Domain professional vocabulary word embedding vector training method, system and medium based on alias standardization
CN111063446A (en) * 2019-12-17 2020-04-24 医渡云(北京)技术有限公司 Method, apparatus, device and storage medium for standardizing medical text data
CN113128216A (en) * 2019-12-31 2021-07-16 中国移动通信集团贵州有限公司 Language identification method, system and device
CN111325032A (en) * 2020-02-21 2020-06-23 中国建设银行股份有限公司 5G + intelligent banking institution name standardization method and device
CN111325032B (en) * 2020-02-21 2023-06-16 中国建设银行股份有限公司 Standardization method and device for name of 5G+ intelligent banking institution
CN111581976B (en) * 2020-03-27 2023-07-21 深圳平安医疗健康科技服务有限公司 Medical term standardization method, device, computer equipment and storage medium
CN111581976A (en) * 2020-03-27 2020-08-25 平安医疗健康管理股份有限公司 Method and apparatus for standardizing medical terms, computer device and storage medium
CN111666754B (en) * 2020-05-28 2023-02-03 深圳平安医疗健康科技服务有限公司 Entity identification method and system based on electronic disease text and computer equipment
CN111666754A (en) * 2020-05-28 2020-09-15 平安医疗健康管理股份有限公司 Entity identification method and system based on electronic disease text and computer equipment
CN111898376A (en) * 2020-07-01 2020-11-06 拉扎斯网络科技(上海)有限公司 Name data processing method and device, storage medium and computer equipment
CN111898376B (en) * 2020-07-01 2024-04-26 拉扎斯网络科技(上海)有限公司 Name data processing method and device, storage medium and computer equipment
CN111859942A (en) * 2020-07-02 2020-10-30 上海森亿医疗科技有限公司 Medical name normalization method and device, storage medium and terminal
CN111859942B (en) * 2020-07-02 2021-07-13 上海森亿医疗科技有限公司 Medical name normalization method and device, storage medium and terminal
CN112022140A (en) * 2020-07-03 2020-12-04 上海数创医疗科技有限公司 Automatic diagnosis method and system for diagnosis conclusion of electrocardiogram
CN111563139A (en) * 2020-07-15 2020-08-21 平安国际智慧城市科技股份有限公司 Checking method and device for identifying invoice drug name through OCR (optical character recognition) and computer equipment
CN112633005A (en) * 2020-11-11 2021-04-09 上海数创医疗科技有限公司 Electrocardio term semantic matching method
CN112580360A (en) * 2020-11-11 2021-03-30 上海数创医疗科技有限公司 Electrocardio term semantic matching device
CN112307763A (en) * 2020-12-30 2021-02-02 望海康信(北京)科技股份公司 Term standardization method, system and corresponding equipment and storage medium
CN112687397B (en) * 2020-12-31 2023-05-09 四川大学华西医院 Rare disease knowledge base processing method and device and readable storage medium
CN112687397A (en) * 2020-12-31 2021-04-20 四川大学华西医院 Rare disease knowledge base processing method and device and readable storage medium
CN112992376A (en) * 2021-03-04 2021-06-18 山东大学 Disease name matching method and system based on weight adjustment
CN113077912B (en) * 2021-04-01 2021-12-14 深圳鸿祥源科技有限公司 Medical Internet of things monitoring system and method based on 5G network
CN113077912A (en) * 2021-04-01 2021-07-06 深圳鸿祥源科技有限公司 Medical Internet of things monitoring system and method based on 5G network
CN113722418A (en) * 2021-08-30 2021-11-30 平安科技(深圳)有限公司 Clinical case standardization method, device, equipment and medium

Similar Documents

Publication Publication Date Title
CN108021553A (en) Word treatment method, device and the computer equipment of disease term
US8380719B2 (en) Semantic content searching
US8595620B2 (en) Document creation and management systems and methods
US20190251471A1 (en) Machine learning device
JP2006260318A (en) Diagnostic reading report input support method and system
US20060047647A1 (en) Method and apparatus for retrieving data
US20140181056A1 (en) System and method of quality assessment of a search index
CA3032614C (en) Localization platform that leverages previously translated content
CN114996388A (en) Intelligent matching method and system for diagnosis name standardization
JP2019032704A (en) Table data structuring system and table data structuring method
GB2537965A (en) Recommending form fragments
JP2021523509A (en) Expert Report Editor
US20100010806A1 (en) Storage system for symptom information of Traditional Chinese Medicine (TCM) and method for storing TCM symptom information
US8805095B2 (en) Analysing character strings
JP2007140861A (en) Information processing system, information processing method, and program
Zweigenbaum et al. Multiple Methods for Multi-class, Multi-label ICD-10 Coding of Multi-granularity, Multilingual Death Certificates.
CN109284497B (en) Method and apparatus for identifying medical entities in medical text in natural language
JP4979637B2 (en) Compound word break estimation device, method, and program for estimating compound word break position
CN110060749B (en) Intelligent electronic medical record diagnosis method based on SEV-SDG-CNN
JP2016110256A (en) Information processing device and information processing program
Fort et al. Annotating football matches: Influence of the source medium on manual annotation
JP6210865B2 (en) Data search system and data search method
US20220147703A1 (en) Voice activated clinical reporting systems and methods thereof
CN111552780B (en) Medical scene search processing method and device, storage medium and electronic equipment
CN114446422A (en) Medical record marking method, system and corresponding equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20180511