CN107145511A - Structured medical data library generating method and system based on medical science text message - Google Patents

Structured medical data library generating method and system based on medical science text message Download PDF

Info

Publication number
CN107145511A
CN107145511A CN201710208112.XA CN201710208112A CN107145511A CN 107145511 A CN107145511 A CN 107145511A CN 201710208112 A CN201710208112 A CN 201710208112A CN 107145511 A CN107145511 A CN 107145511A
Authority
CN
China
Prior art keywords
medical science
variable
text message
medical
result
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710208112.XA
Other languages
Chinese (zh)
Inventor
马汉东
张少典
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Sen Sen Medical Technology Co Ltd
Original Assignee
Shanghai Sen Sen Medical Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Sen Sen Medical Technology Co Ltd filed Critical Shanghai Sen Sen Medical Technology Co Ltd
Priority to CN201710208112.XA priority Critical patent/CN107145511A/en
Publication of CN107145511A publication Critical patent/CN107145511A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2452Query translation
    • G06F16/24522Translation of natural language queries to structured queries

Abstract

The invention discloses a kind of structured medical data library generating method based on medical science text message, including:Obtain the medical science text message of input;The corresponding natural semantic processes model of the medical science text message is determined, and the natural semantic analysis processing of depth is carried out to the medical science text message using the natural semantic processes model, result is obtained;The corresponding medical science variable of each processing data in the result is determined, and each processing data input is obtained into structured medical database to the correspondence position of correspondingly medical science variable;This method is capable of the automaticity and intelligence degree of lift structure medical data base generation, substantially reduces cost of labor, improves structured medical database formation efficiency;The invention discloses a kind of structured medical database creating system based on medical science text message, with above-mentioned beneficial effect.

Description

Structured medical data library generating method and system based on medical science text message
Technical field
The present invention relates to medical data processing technology field, more particularly to a kind of structuring doctor based on medical science text message Learn data library generating method and system.
Background technology
The utilization of unstructured medical science text message is a big difficult point of technical field all the time.Prior art typically makes Manually or semi-artificial mode handles medical science text message.Major part has the doctor of use demand or the practitioner of relevant industries Using manual read's medical science text and do standardization typing by the way of handle non-structured history medical data (retrospective number According to).Universal method substantially, is designed and the electronic structuring of programming realization by related personnel oneself or third party technology provider Form (eCRF), then by related personnel's artificial naked eyes scan text data portionwise, knot is manually entered into after finding relevant information In structure form.A small number of technologies can be realized to be won based on Keywords matching and the semi-automatic information of standard formulation, i.e., from text Matching relative words or expression read information there is provided the convenient artificial naked eyes of complementary tool in this.
I.e. existing solution, which is largely relied on, possesses the artificial of professional knowledge, and its process is time-consuming and cost is huge.Entirely The rare intelligent aid of process, even manual entry information are as the repetition of labor intensity and content is uninteresting and imitates Rate is low.For example, it is desired to excavate all cancer patients from electronic health record and build database.Way is directly to search at this stage " cancer " or " cancer ".Improved way is to arrange the related names of all cancers, and the Keywords matching from case is found out Corresponding patient.But some are in particular cases, such as " lung cancer ", in " bottom right small cell carcinoma of lung ", (attention is herein pass with lung cancer Keyword can not just be found), or in face of a kind of nonstandard form of presentation of cancer, such as abbreviation or clerical mistake, existing mode is with regard to nothing Method processing.The deficiency of these technologies causes the accuracy and degree of recalling of whole information extraction to be all unable to reach expected effects.It is i.e. existing Have in technology and there is processing procedure automation, intelligent deficiency, and cost of labor is high.
The content of the invention
It is an object of the invention to provide a kind of structured medical data library generating method based on medical science text message and it is System, is capable of the automaticity and intelligence degree of lift structure medical data base generation, substantially reduces cost of labor, improve Structured medical database formation efficiency.
In order to solve the above technical problems, the present invention provides a kind of structured medical database life based on medical science text message Into method, methods described includes:
Obtain the medical science text message of input;
The corresponding natural semantic processes model of the medical science text message is determined, and utilizes the natural semantic processes model The natural semantic analysis processing of depth is carried out to the medical science text message, result is obtained;
Determine the corresponding medical science variable of each processing data in the result, and by each processing data input to correspondence doctor The correspondence position of variable is learned, structured medical database is obtained.
Optionally, the corresponding natural semantic processes model of the medical science text message is determined, including:
Extract the key message point of the medical science text message;
The corresponding medical science text categories of the medical science text message are determined according to the key message point;
Determine the corresponding natural semantic processes model of the medical science text categories.
Optionally, the natural semantic analysis of depth is carried out to the medical science text message using the natural semantic processes model Processing, including:
Obtain the granularity threshold value of input;
Make the natural semantic processes model according to the granularity threshold value, depth is carried out to the medical science text message certainly Right semantic analysis processing.
Optionally, after obtaining result, in addition to:
The corresponding result of specified medical science variable included in the result is entered using Medicine standard database Row standardization mapping processing, obtains standardization result.
Optionally, the corresponding medical science variable of each processing data in the result is determined, including:
Determine the corresponding primary medical science variable of each processing data in the result;
The primary medical science variable is handled using artificial rules integration correction logic, obtained at primary medical science variable Manage result;
When there is senior medical science variable in the primary medical science variable result, according to alignment processing data and institute The corresponding logical relation of senior medical science variable is stated, senior medical science variable processing data is generated.
Optionally, after the medical science text message for obtaining input, in addition to:
Data desensitization process is carried out to the medical science text message.
The present invention also provides a kind of structured medical database creating system based on medical science text message, including:
Acquisition module, the medical science text message for obtaining input;
Natural semantic processes module, for determining the corresponding natural semantic processes model of the medical science text message, and profit The natural semantic analysis processing of depth is carried out to the medical science text message with the natural semantic processes model, processing knot is obtained Really;
Structured medical database generation module, for determining that the corresponding medical science of each processing data becomes in the result Amount, and each processing data input is obtained into structured medical database to the correspondence position of correspondence medical science variable.
Optionally, the natural semantic processes module, including:
Granularity threshold value acquiring unit, the granularity threshold value for obtaining input;
Natural semantic processing unit, for making the natural semantic processes model according to the granularity threshold value, to described Medical science text message carries out the natural semantic analysis processing of depth.
Optionally, this programme also includes:
Standardization module, for being become using Medicine standard database to the specified medical science included in the result Measure corresponding result and be standardized mapping processing, obtain standardization result.
Optionally, the structured medical database generation module, including:
Primary medical science variable cell, for determining the corresponding primary medical science variable of each processing data in the result;
Amending unit, for being handled using artificial rules integration correction logic the primary medical science variable, is obtained Primary medical science variable result;
Senior medical science variable cell, for when there is senior medical science variable in the primary medical science variable result, According to alignment processing data and the corresponding logical relation of the senior medical science variable, senior medical science variable processing data is generated.
A kind of structured medical data library generating method based on medical science text message provided by the present invention, including:Obtain Take the medical science text message of input;The corresponding natural semantic processes model of the medical science text message is determined, and described in certainly Right semantic processes model carries out the natural semantic analysis of depth to the medical science text message and handled, and obtains result;Determine institute State the corresponding medical science variable of each processing data in result, and by each processing data input to the corresponding position of correspondingly medical science variable Put, obtain structured medical database;
It can be seen that, this method captures the medical science variable formation structure of medical science text message using nature semantic processes model automatically Change medical data base, that is, be capable of the automaticity and intelligence degree of lift structure medical data base generation, substantially reduce Cost of labor, improves structured medical database formation efficiency;The invention discloses a kind of structure based on medical science text message Change medical data base generation system, with above-mentioned beneficial effect, will not be repeated here.
Brief description of the drawings
In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing There is the accompanying drawing used required in technology description to be briefly described, it should be apparent that, drawings in the following description are only this The embodiment of invention, for those of ordinary skill in the art, on the premise of not paying creative work, can also basis The accompanying drawing of offer obtains other accompanying drawings.
The structured medical data library generating method based on medical science text message that Fig. 1 is provided by the embodiment of the present invention Flow chart;
The exemplary plot for the natural semantic analysis processing of depth that Fig. 2 is provided by the embodiment of the present invention;
The structured medical database creating system based on medical science text message that Fig. 3 is provided by the embodiment of the present invention Structured flowchart.
Embodiment
The core of the present invention is to provide a kind of structured medical data library generating method based on medical science text message and is System, is capable of the automaticity and intelligence degree of lift structure medical data base generation, substantially reduces cost of labor, improve Structured medical database formation efficiency.
To make the purpose, technical scheme and advantage of the embodiment of the present invention clearer, below in conjunction with the embodiment of the present invention In accompanying drawing, the technical scheme in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is A part of embodiment of the present invention, rather than whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art The every other embodiment obtained under the premise of creative work is not made, belongs to the scope of protection of the invention.
Medical big data is very hot door and challenging field now.Compared to other various fields, medical industry The challenge of mass data and unstructured data is already encountered, and many countries are all actively pushing forward medical information in recent years Development, this many medical institutions to have fund to do big data analysis.However, medical big data is facing to lot of challenges.
First, substantial amounts of clinical data exists in non-structured text entry form, and this is provided to big data analysis Acid test.For example in hospital clinical data it is the maximum amount of be also it is most valuable be patient electronic health record data.Electronics Case history refers to the summation of the data such as word, symbol, chart, image, section that medical worker is formed during curative activity, bag Include door (urgency) and diagnose a disease and go through and inpatient cases, electronic health record refers not only to static medical record information, includes the related service of offer.Be with Relevant personal lifetime health state and the information of health care behavior that electronic mode is managed, be related to patient information collection, All procedural informations for storing, transmit, handling and utilizing.So visible, the content that electronic health record is included, is that sufferer user is most complete Whole, detailed clinical information resource.Just by the complexity that it includes information, it largely relies on the text of big section Information is passed in description.Processing and analytical plan that nowadays the non-structured text message of this class does not almost automate.Scientific research Personnel need to search for electronic health record data by keyword or combination condition.However, common natural language participle and syntax tree Storehouse can destroy original professional meaning of a word, can so have a strong impact on search effect when the noun to medical speciality is handled.Cause How this, be combined by natural language with medical speciality term, and the scientific and reasonable participle of formation and Parsing algorithm seem It is particularly important.
Secondly, it is the shortage standard of medical data information.Hospital internal data inputting person is numerous now, and without architecture Data inputting standard.Simultaneity factor module is numerous, data interface standard disunity.These reasons make data in the court all without Method is effectively connected, as information island.With area medical resource information platform, the data value of single hospital is still very It is limited, to realize distributing rationally for area medical resource, it is necessary to by area medical big data interconnection.This trend equally quilt Shortage standard between hospital and seriously hinder.
Therefore, unstructured, off-gauge medical data is organically recognized as early as possible, uniformly, using would is that medical science is big The extremely important technical barrier of data fields.The present embodiment be solve the problem and design automate, intelligent solution party Case.I.e. the present embodiment can be read, understood by automation algorithm full section office medical science text message (including electronic health record, inspection Observe and predict announcement etc.), the semanteme in it, and further structuring are analyzed using nature semantic processes model (i.e. intelligent algorithm) The structured medical database that can recognize that as computer.Specifically it refer to Fig. 1, a kind of structuring based on medical science text message Medical data library generating method, can specifically include:
S100, the medical science text message for obtaining input;
Wherein, the present embodiment does not limit the species of specific medical science text message.It for example can be electronic health record (serious disease Go through text), operation record, iconography report and survey report etc..I.e. in the case where being normally applied scene, the medical science such as electronic health record text This information can be exported from hospital information system and obtained.Its medical science text message overwhelming majority is unformatted txt after export Form.
S110, determine the corresponding natural semantic processes model of medical science text message, and utilize nature semantic processes model pair Medical science text message carries out the natural semantic analysis processing of depth, obtains result;
Wherein, the species of nature semantic processes model is not limited in the present embodiment, you can with only one natural language Justice processing model, whole medical science text messages of corresponding acquisition all carry out the natural language of depth by the natural semantic processes model Justice analyzing and processing.Can also be the species according to medical science text message, using it is corresponding with the species of medical science text message from Right semantic processes model carries out the natural semantic analysis processing of depth.The natural semantic processes model only one of which that the former trains, because This training process is simple.The latter is corresponding with the species of medical science text message due to natural semantic processes model, therefore training Natural semantic processes Number of Models is more, and its corresponding identification and the accuracy extracted are also high.I.e. such as medical science text has following several Big classification:Big case history text, operation record, iconography report, survey report etc..The each corresponding writing standard of classification and content It is different.Thus, to reach higher identification and extracting the degree of accuracy, different medical science texts can be classified first, and Special natural semantic processes model training is carried out for different texts.Wherein, natural semantic processes model acquisition process can To be to integrate disclosed all Medical Dictionaries first, as far as possible comprising more standard medical vocabulary;Then each section office are collected true Real case data, find the artificial participle of medical profession and mark disease;Then machine learning algorithm is used, based on artificial mark As a result NLP models are trained.Finally, the international standard knowledge bases such as UMLS or SNOMEDCT and are corresponded to completely.
It is i.e. preferred, determine that the corresponding natural semantic processes model of medical science text message can include:
Extract the key message point of medical science text message;
The corresponding medical science text categories of medical science text message are determined according to key message point;
Determine the corresponding natural semantic processes model of medical science text categories.
Specifically, key message point is that can recognize the data of medical science text message of all categories in this preferred embodiment, Such as all kinds of crucial medical concepts.It is for example in case history general to occur first time progress note, case history, the first content such as course of disease. And the key message content such as some particular test data typically occurs in survey report.I.e. this preferred embodiment is not limited respectively The key message point of medical science text under type, can be configured by user according to the actual features of all types of medical science texts and Modification.Difference in special the present embodiment with nearly all maximum in the prior art is that the present embodiment uses machine Study is handled plus the technology of natural language processing.Therefore Keywords matching will not be used as the side of identification important information Method (because the degree of accuracy that keyword is recognized is not enough, more for example, wrongly writes the situation of word, some non-standard expression with regard to None- identified) And the present embodiment, even if this word did not ran into, can also be identified by the ability with certain new word identification.That is this reality Apply in example and to determine that the corresponding medical science text categories of medical science text message are relatively reliable by key message point, accurately, so as to improve The accuracy of natural semantic processes model selection.
According to the characteristics of all types of medical science texts, the medical science variable extracted the need for its correspondence is determined, and according to this reality The identification and extraction of existing customizing messages, therefore train according to the characteristics of all types of medical science texts corresponding natural semantic processes mould Type, you can to obtain corresponding result according to the natural semantic processes model extraction.
Further, because the form for medical science text message its original fixation for obtaining input possibly can not be retained.This gives Related medical data analysis brings great difficulty.The different fragments in medical science text thus can be recognized.Using case history as Example is illustrated:For example be in case history can be comprising being admitted to hospital brief summary, progress note, discharge abstract, the piece of the different times such as operation record Section.Therefore the corresponding natural language of case history text message can be made in order to further improve extraction accuracy and medical record information search precision Justice processing model can each fragment in case history text message, and the natural semantic analysis processing of depth carried out to each fragment obtain each The processing data of the distinctive medical science variable of section correspondence.The method combined for example with pattern match and deep learning, is obtained to recognize The following information point of the case history text taken:The mark word of the homologous segment occurred in a, all kinds of case histories, such as:" the first time course of disease is remembered Record ", " course of disease first ", " progress note first " etc.;Occur the context of specific fragment in b, case history text, including form, often With grammer, common words etc..The algorithm integrated can be accurately by one whole section of text case history according to each class discrimination Open, such issues that aid in following accurately case history screening, such as " patient with cancer when finding out discharge " to be accomplished by passing through Screen the discharge abstract of all patients it is concluded that.The natural semantic analysis processing of depth is carried out respectively to each fragment realizing, is obtained To the corresponding result of each fragment.
Further, for the medical science variable of more careful extraction acquisition medical science text message, it is also contemplated that not equal The content difference of room or disease, is that each section office or disease build individually natural semantic processes model and can also further improved The degree of accuracy.I.e. preferred, the present embodiment can use intelligent text sorting algorithm, can be by the Text region arbitrarily inputted into upper State some classification of text categories and pointedly select the natural semantic processes model for being adapted to category depth analysis to carry out Processing, to reach best treatment effect.
Wherein, the natural semantic analysis processing procedure of depth can include carrying out medical science text message successively word segmentation processing, The operation such as part of speech analysis, Entity recognition, syntactic analysis, semantic analysis may finally realize the extraction of medical science variable.
Further, in order to which the personal information or sensitive information of protecting doctor, patient and hospital etc. are not disclosed.This is excellent Select embodiment medical science text message can also be carried out data desensitization process (remove the organization names such as patient privacy information, hospital, Worker informations such as doctor etc.) etc..
Fig. 2 is refer to, is the natural semantic analysis processing example figure of depth, it uses general natural semantic processes logic, And the desensitization paid close attention in terms of incorporating medical science (removes the worker informations such as organization names, the doctors such as patient privacy information, hospital Deng) etc. step, realize the depth customization semantic analysis for medical science text message.The analysis process can be by case history Specialized vocabulary etc. and separated, the related part of speech analysis (for example, belonging to disease, symptom, index etc.) of progress, by it according to knowledge base Carry out expression way normalization (for example " newborn lump " is changed into " lump in breast ", " more than 1 year " be changed into ">1 year (>1year)”).So The association analysis of correlation is carried out afterwards, finds the relevance between extracted important information vocabulary.For example, such as one in case history Word:" it was found that right newborn lump more than 1 year ", can be by after this step:" it was found that ", " right side ", " breast ", " lump ", " 1 year ", " remaining " Deng segmenting words, then by system discovery, " right side " is to belong to " position (orientation) ", " lump in breast " is a class disease, " 1 year It is remaining " it is a timing node.And " position " of lump in breast is right side, and the time of lump in breast is that " more than 1 year " specifically refer to Fig. 2.
S120, the corresponding medical science variable of each processing data in result is determined, and by each processing data input to correspondingly The correspondence position of medical science variable, obtains structured medical database.
Specifically, specific insert each processing data included in result in corresponding medical science variable of the step obtains To structured medical database.The unstructured medical science text message that will be inputted is converted into the structured medical represented with variable Database.I.e. the structured medical database can pass through each input medical science text message of list structure record.Here doctor Learning variable can be preset by user.Its setting process can contemplate the corresponding medical science text of nature semantic processes model The species of information, even its fragment come determine need extract medical science variable.Its traditional Chinese medicine variable can include time, disease Disease, symptom, index etc..
After the structured medical database is formed, user can inquire about the medical science by typing medical science text message numbering Text message, can also expect that the concrete numerical value of the medical science variable of search carries out database data inquiry by input, can be with Correspondence medical science text data is carried out by input text type to inquire about, or the specific fragment under a certain type of input is corresponded to Data query.
Based on above-mentioned technical proposal, the life of the structured medical database based on medical science text message that the embodiment of the present invention is carried Into method, the medical science variable formation structured medical data of medical science text message are captured automatically using nature semantic processes model Storehouse, that is, be capable of the automaticity and intelligence degree of lift structure medical data base generation, substantially reduce cost of labor, carry Highly structured medical data base formation efficiency.
Database by artificial and regular generation is difficult flexible change, the change of accumulation before after addition or modification medical science variable Amount can not auto-complete.The thus change of any database is intelligently directed to perspective data, retrospective data and in advance collects Database is often abandoned.This feature request just must be thorough perfect when design database extracts model, yet with not Different with the demand of personnel with project, the customer-oriented requirement of this model can be considerably complicated and frequent.Therefore, obtain in the prior art Database flexibility ratio it is not enough, the structuring form (database) of generation is difficult to be modified and change.Therefore, based on above-mentioned reality Example is applied, in order that the natural semantic analysis processing of depth is carried out to medical science text message using nature semantic processes model to wrap Include:
Obtain the granularity threshold value of input;
Make nature semantic processes model according to granularity threshold value, medical science text message is carried out at the natural semantic analysis of depth Reason.
Specifically, during the use of all kinds of different users, many times granularity of all kinds of people for medical science variable Demand be inconsistent, such as wish that text case history is broken up completely in terms of certain structures typing, it is any be not specialty The vocabulary of noun is all separated, so that it is screened.Some other clinical demands are then all contents extractions of having a medical check-up from case history Come, or a word for representing MRI results is extracted.Therefore it is the demand for meeting this class, present embodiments provides tune The whole possibility for extracting granularity (i.e. granularity threshold value).So, by the adjustment of granularity threshold value, different user can be by medical science Text is broken up in different scale and (is broken into slag and is still broken into several pieces).For example, for the deciphering of noun in same medical science text message There are a variety of modes.Such as " right lower abdominal pain ", one kind of pain can be divided into first, its position is specific again in Right Lower Abdomen in belly.This Sample " right lower abdominal pain " can just be divided into " pain " in different medical worker's eyes, " stomachache ", " hypogastralgia ", " right lower abdominal pain " Classification.Such deciphering rule can not be unified by some specific standard, and it may change with the change of research purpose.
Therefore, the present embodiment provides the method that user can be allowed to select participle granularity.Point used in basic embodiment Word algorithm calculates the probability that each word is split out in medical science text message, thus the different granularity threshold value of correspondence can be by text Originally it is cut into that block number is different, the different fragment of word length.For example, user such as need to split medical science text message as far as possible, it can With by granularity threshold value set it is relatively low, so, once two words be not it is very clear and definite necessarily appear together, can all be separated (granularity threshold value set it is minimum and text is cut into completely man combination).Conversely, as the same.With this, by participle threshold The control of value, user can control the cutting degree of fragment.
Further, user can also be limited by the slit mode of higher level.It is i.e. preferred, in order that utilizing nature Semantic processes model, which carries out the natural semantic analysis processing of depth to medical science text message, to be included:
Obtain the segmentation rules of input;
Make nature semantic processes model according to the segmentation rules, the natural semantic analysis of depth is carried out to medical science text message Processing.
Specifically, as above " right lower abdominal pain " in example, wherein " bottom right " is identified as modifying " abdomen " this body part The noun of locality, " abdomen " further modifies symptom " pain ".User can specify it is unified by body part and symptom carry out cutting (bottom right, Abdomen, pain) or merge (bottom right, stomachache), or three can be combined (right lower abdominal pain), participle is precisely controlled with this Granularity.
Because the standard degree of Chinese clinical unstructured text data is relatively low, literary style expression method is different, causes do not having Have in the state of knowledge base is difficult to do the information extraction standardized.At this stage, the country also has no pervasive clinical data standards and (known Know storehouse) extracted with auxiliary information with standardizing.There are many databases in each section office in broad terms.But these databases it Between how to exchange with succession (time dimension and Spatial Dimension) with regard into problem.It is likely that taking time and effort the database done just Being used in an article can not just reuse.Such as in different hospital, section office, the participation process of keyboarder case history Writing expression is ever-changing, and resulting characteristic lack of standardization generates great difficulty for the data critical-path analysis in later stage.This Embodiment can solve the problem of data normalization deficiency in database, data exchange and limited succession.I.e. based on above-mentioned any It can also include in embodiment, the present embodiment:
Rower is entered to the corresponding result of specified medical science variable included in result using Medicine standard database Standardization mapping is handled, and obtains standardization result.
Specifically, ambiguity and usage lack of standardization in order to solve input data, during medical science text structure, this Embodiment can recognize that all kinds of off-gauge medical science are expressed and unified to international standards of medical education knowledge base.This feature ensures All medical datas for flowing through algorithm, in spite of for same people's typing, whether from same system, whether from same doctor Institute, can interconnect.The vocabulary that all will be seen is mapped to Medicine standard database.The process of this mapping will be all Expression way occur in medical science text message, off-gauge is unified specific conceptive or will go out in result to some Existing, off-gauge expression way is unified specific conceptive to some.So that can when similar expression will be run into future Reference format is accurately unified into, and understands the meaning of its expression.Wherein Medicine standard database can include: SNOMED-CT, ICD, HPO, UMLS etc., but it is not limited to this.A kind of a variety of usages of things can be mapped as Unified normative term.
For example, the expression of Chinese medical content comes in every shape, may there are a variety of expression ways, example for some specified disease Such as:Cerebral apoplexy, apoplexy are such.It can wherein be roughly divided into following and of all categories:Abbreviation (in/English), nonstandard expression, mistake book Write.And these different are expressed in different texts the identical purpose that may represent.Now it is accomplished by represent same The vocabulary of individual implication is standardized, and needs the result for ensureing this standardization to meet approved medical science both at home and abroad Knowledge standard, such as the SNOMED-CT or the ICD of classification of diseases or the RxNorm of medicine of displaying symptom.
Based on above-mentioned any embodiment, determine that the corresponding medical science variable of each processing data can include in result:
Determine the corresponding primary medical science variable of each processing data in result;
Primary medical science variable is handled using artificial rules integration correction logic, primary medical science variable processing knot is obtained Really;
When there is senior medical science variable in primary medical science variable result, according to alignment processing data and senior doctor The corresponding logical relation of variable is learned, senior medical science variable processing data is generated.
Specifically, in medical domain, exist most basic (can be herein referred to as by the medical science variable being directly written in case history For primary medical science variable or rudimentary medical science variable).Simultaneously there is also some need to integrate rudimentary variable could it is concluded that Senior medical science variable (for example, the scoring of most of medical science needs to be formed according to several primary medical science variable conformity calculations).For The demand that user calculates such senior medical science variable is met, artificial rules integration after natural language processing is present embodiments provided and repaiies Positive function.Using this function, variable is mutually combined, added by the conclusion that user can be drawn based on early stage natural language processing Plus logical relation, ultimately produce corresponding senior medical science variable.
For the primary medical science variable obtained in medical science text message, judge that it is by artificial rules integration correction logic It is no to obtain senior medical science variable;If there is senior medical science variable, according to obtaining the corresponding primary of the senior medical science variable The specific data of medical science variable, determine the processing data of the senior medical science variable.It is directed to various medical need, some medical science Information needs to obtain and be further analyzed by specific logic judgment.For example, medical worker wishes to know whether hand Art patient occurs in that some specific respiratory system complication symptoms, such as apnea, atelectasis in art.In this way, this becomes Amount information just can not be obtained directly by case history text.On the contrary, its need after key message point in case history is identified by Specific logic judgment is obtained.To obtain this class variable, the letter that system is first performed the operation patient by natural language processing engine Breath is extracted, and " apnea ", " atelectasis " in operation record.By logic judgment, it can find and occur " exhaling in art The situation of suction pause " AND/OR " atelectasis " symptom, new variables " intraoperative compliaction whether there is " is generated with this.
Further, in order to improve the adaptability of structured medical database, the actual demand of all types of user is better met, User can fill in structuring list and generate the structured medical database being more consistent with self-demand.Wherein, here Structuring list mainly fills in some structured medical databases generation processing rule by user.For example provide medical science text envelope Breath is the source of data source, quantity etc., and output result is species of medical science variable etc., decimation rule of senior medical science variable etc., Whether need to be standardized data, and some other relevant letter is formed to final structure medical data base Breath requirement etc..Content of the present embodiment not to specific structuring list is defined.
Further, because userbase can be very big, such as user is a hospital, then its one big structure of correspondence Changing medical data base may occur that each data search is entered in the range of Quan Yuan during the use of each department of section office OK, data search scope can be expanded., can be in the structure in order to further improve user structure medical data base service efficiency Change in medical data base and set up project team, and the data source of gainer group, variable number etc., and then each project team can be formed Corresponding structured medical database.User can also be that project team is managed at any time, for example, increase, and delete or change Project team.So as to improve the service efficiency of structured medical database.
Based on above-mentioned technical proposal, the structured medical database provided in an embodiment of the present invention based on medical science text message Generation method, is capable of the automaticity and intelligence degree of lift structure medical data base generation, substantially reduce it is artificial into This, improves structured medical database formation efficiency, improves scheme flexibility ratio, and the structuring form (database) of generation is easy to repair Just with change, variable data in medical science text message is standardized, to improve data exchange and succession ability.
The structured medical database creating system provided in an embodiment of the present invention based on medical science text message is entered below Row is introduced, and the structured medical database creating system described below based on medical science text message is with above-described based on doctor The structured medical data library generating method for learning text message can be mutually to should refer to.
It refer to Fig. 3, the structured medical database based on medical science text message that Fig. 3 is provided by the embodiment of the present invention The structured flowchart of generation system;The system can include:
Acquisition module 100, the medical science text message for obtaining input;
Natural semantic processes module 200, for determining the corresponding natural semantic processes model of medical science text message, and is utilized Natural semantic processes model carries out the natural semantic analysis of depth to medical science text message and handled, and obtains result;
Structured medical database generation module 300, for determining that the corresponding medical science of each processing data becomes in result Amount, and each processing data input is obtained into structured medical database to the correspondence position of correspondence medical science variable.
Based on above-described embodiment, natural semantic processes module 200 can include:
Granularity threshold value acquiring unit, the granularity threshold value for obtaining input;
Natural semantic processing unit, for making nature semantic processes model according to granularity threshold value, to medical science text message Carry out the natural semantic analysis processing of depth.
Based on above-mentioned any embodiment, the system also includes:
Standardization module, for using Medicine standard database to the specified medical science variable pair that is included in result The result answered is standardized mapping processing, obtains standardization result.
Based on above-mentioned any embodiment, structured medical database generation module 300 can include:
Primary medical science variable cell, for determining the corresponding primary medical science variable of each processing data in result;
Amending unit, for being handled using artificial rules integration correction logic primary medical science variable, obtains primary Medical science variable result;
Senior medical science variable cell, for when there is senior medical science variable in primary medical science variable result, according to Alignment processing data and the corresponding logical relation of senior medical science variable, generate senior medical science variable processing data.
The embodiment of each in specification is described by the way of progressive, and what each embodiment was stressed is and other realities Apply the difference of example, between each embodiment identical similar portion mutually referring to.For device disclosed in embodiment Speech, because it is corresponded to the method disclosed in Example, so description is fairly simple, related part is referring to method part illustration .
Professional further appreciates that, with reference to the unit of each example of the embodiments described herein description And algorithm steps, can be realized with electronic hardware, computer software or the combination of the two, in order to clearly demonstrate hardware and The interchangeability of software, generally describes the composition and step of each example according to function in the above description.These Function is performed with hardware or software mode actually, depending on the application-specific and design constraint of technical scheme.Specialty Technical staff can realize described function to each specific application using distinct methods, but this realization should not Think beyond the scope of this invention.
Directly it can be held with reference to the step of the method or algorithm that the embodiments described herein is described with hardware, processor Capable software module, or the two combination are implemented.Software module can be placed in random access memory (RAM), internal memory, read-only deposit Reservoir (ROM), electrically programmable ROM, electrically erasable ROM, register, hard disk, moveable magnetic disc, CD-ROM or technology In any other form of storage medium well known in field.
Above to structured medical data library generating method and system provided by the present invention based on medical science text message It is described in detail.Specific case used herein is set forth to the principle and embodiment of the present invention, and the above is real The explanation for applying example is only intended to the method and its core concept for helping to understand the present invention.It should be pointed out that for the art For those of ordinary skill, under the premise without departing from the principles of the invention, some improvement and modification can also be carried out to the present invention, These are improved and modification is also fallen into the protection domain of the claims in the present invention.

Claims (10)

1. a kind of structured medical data library generating method based on medical science text message, it is characterised in that methods described includes:
Obtain the medical science text message of input;
The corresponding natural semantic processes model of the medical science text message is determined, and using the natural semantic processes model to institute State medical science text message and carry out the natural semantic analysis processing of depth, obtain result;
The corresponding medical science variable of each processing data in the result is determined, and each processing data input is arrived into correspondence medical science change The correspondence position of amount, obtains structured medical database.
2. according to the method described in claim 1, it is characterised in that determine at the corresponding semanteme naturally of the medical science text message Model is managed, including:
Extract the key message point of the medical science text message;
The corresponding medical science text categories of the medical science text message are determined according to the key message point;
Determine the corresponding natural semantic processes model of the medical science text categories.
3. method according to claim 1 or 2, it is characterised in that using the natural semantic processes model to the doctor Learn text message and carry out the natural semantic analysis processing of depth, including:
Obtain the granularity threshold value of input;
Make the natural semantic processes model according to the granularity threshold value, the natural language of depth is carried out to the medical science text message Justice analyzing and processing.
4. method according to claim 3, it is characterised in that after obtaining result, in addition to:
Rower is entered to the corresponding result of specified medical science variable included in the result using Medicine standard database Standardization mapping is handled, and obtains standardization result.
5. method according to claim 4, it is characterised in that determine the corresponding doctor of each processing data in the result Variable is learned, including:
Determine the corresponding primary medical science variable of each processing data in the result;
The primary medical science variable is handled using artificial rules integration correction logic, primary medical science variable processing knot is obtained Really;
When there is senior medical science variable in the primary medical science variable result, according to alignment processing data and the height The corresponding logical relation of level medical science variable, generates senior medical science variable processing data.
6. method according to claim 5, it is characterised in that after the medical science text message for obtaining input, in addition to:
Data desensitization process is carried out to the medical science text message.
7. a kind of structured medical database creating system based on medical science text message, it is characterised in that including:
Acquisition module, the medical science text message for obtaining input;
Natural semantic processes module, for determining the corresponding natural semantic processes model of the medical science text message, and utilizes institute State nature semantic processes model and the natural semantic analysis processing of depth is carried out to the medical science text message, obtain result;
Structured medical database generation module, for determining the corresponding medical science variable of each processing data in the result, And each processing data input is obtained into structured medical database to the correspondence position of correspondence medical science variable.
8. system according to claim 7, it is characterised in that the natural semantic processes module, including:
Granularity threshold value acquiring unit, the granularity threshold value for obtaining input;
Natural semantic processing unit, for making the natural semantic processes model according to the granularity threshold value, to the medical science Text message carries out the natural semantic analysis processing of depth.
9. system according to claim 8, it is characterised in that also include:
Standardization module, for using Medicine standard database to the specified medical science variable pair that is included in the result The result answered is standardized mapping processing, obtains standardization result.
10. system according to claim 9, it is characterised in that the structured medical database generation module, including:
Primary medical science variable cell, for determining the corresponding primary medical science variable of each processing data in the result;
Amending unit, for being handled using artificial rules integration correction logic the primary medical science variable, obtains primary Medical science variable result;
Senior medical science variable cell, for when there is senior medical science variable in the primary medical science variable result, according to Alignment processing data and the corresponding logical relation of the senior medical science variable, generate senior medical science variable processing data.
CN201710208112.XA 2017-03-31 2017-03-31 Structured medical data library generating method and system based on medical science text message Pending CN107145511A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710208112.XA CN107145511A (en) 2017-03-31 2017-03-31 Structured medical data library generating method and system based on medical science text message

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710208112.XA CN107145511A (en) 2017-03-31 2017-03-31 Structured medical data library generating method and system based on medical science text message

Publications (1)

Publication Number Publication Date
CN107145511A true CN107145511A (en) 2017-09-08

Family

ID=59783900

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710208112.XA Pending CN107145511A (en) 2017-03-31 2017-03-31 Structured medical data library generating method and system based on medical science text message

Country Status (1)

Country Link
CN (1) CN107145511A (en)

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107578798A (en) * 2017-10-26 2018-01-12 北京康夫子科技有限公司 The processing method and system of electronic health record
CN108320808A (en) * 2018-01-24 2018-07-24 龙马智芯(珠海横琴)科技有限公司 Analysis of medical record method and apparatus, equipment, computer readable storage medium
CN108711454A (en) * 2018-06-29 2018-10-26 北京大学口腔医学院 Removable partial denture design scheme generation method, equipment and medium
CN108831562A (en) * 2018-06-22 2018-11-16 北京海德康健信息科技有限公司 A kind of disease name standard convention database and its method for building up
CN108922633A (en) * 2018-06-22 2018-11-30 北京海德康健信息科技有限公司 A kind of disease name standard convention method and canonical system
CN109448841A (en) * 2018-11-09 2019-03-08 天津开心生活科技有限公司 Establish data model method and device, clinical aid decision-making method and device
CN109522413A (en) * 2018-11-21 2019-03-26 上海依智医疗技术有限公司 The construction method and device in a kind of hospital guide's medical terminology library
CN110223783A (en) * 2019-06-13 2019-09-10 上海明品医学数据科技有限公司 A kind of control method in multiple terminal interaction medical datas
CN110263176A (en) * 2019-05-14 2019-09-20 武汉维特鲁威生物科技有限公司 A kind of Medical data integration method and system based on ontology
CN110289057A (en) * 2018-03-19 2019-09-27 北京医联蓝卡在线科技有限公司 A kind of voice consultation system and method
CN110674244A (en) * 2019-08-20 2020-01-10 南京医渡云医学技术有限公司 Structured processing method and device for medical text
CN110827988A (en) * 2018-08-14 2020-02-21 上海明品医学数据科技有限公司 Control method for medical data research based on mobile terminal
CN110827989A (en) * 2018-08-14 2020-02-21 上海明品医学数据科技有限公司 Control method for processing medical data based on key factors
CN110827945A (en) * 2018-08-14 2020-02-21 上海明品医学数据科技有限公司 Control method for generating key factors based on medical data
CN110888926A (en) * 2019-10-22 2020-03-17 北京百度网讯科技有限公司 Method and device for structuring medical text
CN111858643A (en) * 2020-06-29 2020-10-30 上海森亿医疗科技有限公司 Database variable production method, system, computer device and storage medium
CN111951946A (en) * 2020-07-17 2020-11-17 合肥森亿智能科技有限公司 Operation scheduling system, method, storage medium and terminal based on deep learning
CN112560494A (en) * 2020-12-24 2021-03-26 宝创瑞海(北京)科技发展有限公司 Method and system for integrating clinical diagnosis and treatment data
CN112669918A (en) * 2020-12-24 2021-04-16 上海市第一人民医院 Ophthalmic VEGF-related multidimensional clinical trial data processing method and system
CN112712863A (en) * 2021-01-05 2021-04-27 中国人民解放军海军军医大学第一附属医院 Method and system for calculating clinical data of accurate drug administration for liver metastasis of colon cancer
CN114912887A (en) * 2022-04-20 2022-08-16 深圳市医未医疗科技有限公司 Clinical data entry method and device based on electronic medical record
CN115034204A (en) * 2022-05-12 2022-09-09 浙江大学 Method for generating structured medical text, computer device, storage medium and program product
CN116796718A (en) * 2023-06-13 2023-09-22 普瑞纯证医疗科技(广州)有限公司 Product specification generation method and system based on artificial intelligence generated content

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102902831A (en) * 2011-07-25 2013-01-30 上海宝信软件股份有限公司 Analytical method of assay statistical data
US20150120733A1 (en) * 2013-10-29 2015-04-30 Google Inc. Systems and methods for improved coverage of input media in content summarization
CN104978587A (en) * 2015-07-13 2015-10-14 北京工业大学 Entity-identification cooperative learning algorithm based on document type
CN106095913A (en) * 2016-06-08 2016-11-09 广州同构医疗科技有限公司 A kind of electronic health record text structure method
CN106484674A (en) * 2016-09-20 2017-03-08 北京工业大学 A kind of Chinese electronic health record concept extraction method based on deep learning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102902831A (en) * 2011-07-25 2013-01-30 上海宝信软件股份有限公司 Analytical method of assay statistical data
US20150120733A1 (en) * 2013-10-29 2015-04-30 Google Inc. Systems and methods for improved coverage of input media in content summarization
CN104978587A (en) * 2015-07-13 2015-10-14 北京工业大学 Entity-identification cooperative learning algorithm based on document type
CN106095913A (en) * 2016-06-08 2016-11-09 广州同构医疗科技有限公司 A kind of electronic health record text structure method
CN106484674A (en) * 2016-09-20 2017-03-08 北京工业大学 A kind of Chinese electronic health record concept extraction method based on deep learning

Cited By (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107578798A (en) * 2017-10-26 2018-01-12 北京康夫子科技有限公司 The processing method and system of electronic health record
CN108320808A (en) * 2018-01-24 2018-07-24 龙马智芯(珠海横琴)科技有限公司 Analysis of medical record method and apparatus, equipment, computer readable storage medium
CN110289057A (en) * 2018-03-19 2019-09-27 北京医联蓝卡在线科技有限公司 A kind of voice consultation system and method
CN108831562A (en) * 2018-06-22 2018-11-16 北京海德康健信息科技有限公司 A kind of disease name standard convention database and its method for building up
CN108922633A (en) * 2018-06-22 2018-11-30 北京海德康健信息科技有限公司 A kind of disease name standard convention method and canonical system
CN108711454A (en) * 2018-06-29 2018-10-26 北京大学口腔医学院 Removable partial denture design scheme generation method, equipment and medium
CN110827945B (en) * 2018-08-14 2022-05-27 上海明品医学数据科技有限公司 Control method for generating key factors based on medical data
CN110827989B (en) * 2018-08-14 2022-07-12 上海明品医学数据科技有限公司 Control method for processing medical data based on key factors
CN110827988A (en) * 2018-08-14 2020-02-21 上海明品医学数据科技有限公司 Control method for medical data research based on mobile terminal
CN110827989A (en) * 2018-08-14 2020-02-21 上海明品医学数据科技有限公司 Control method for processing medical data based on key factors
CN110827945A (en) * 2018-08-14 2020-02-21 上海明品医学数据科技有限公司 Control method for generating key factors based on medical data
CN109448841A (en) * 2018-11-09 2019-03-08 天津开心生活科技有限公司 Establish data model method and device, clinical aid decision-making method and device
CN109522413A (en) * 2018-11-21 2019-03-26 上海依智医疗技术有限公司 The construction method and device in a kind of hospital guide's medical terminology library
CN110263176A (en) * 2019-05-14 2019-09-20 武汉维特鲁威生物科技有限公司 A kind of Medical data integration method and system based on ontology
CN110223783A (en) * 2019-06-13 2019-09-10 上海明品医学数据科技有限公司 A kind of control method in multiple terminal interaction medical datas
CN110223783B (en) * 2019-06-13 2023-08-18 上海明品医学数据科技有限公司 Control method for interaction of medical data at multiple terminals
CN110674244A (en) * 2019-08-20 2020-01-10 南京医渡云医学技术有限公司 Structured processing method and device for medical text
CN110888926B (en) * 2019-10-22 2022-10-28 北京百度网讯科技有限公司 Method and device for structuring medical text
CN110888926A (en) * 2019-10-22 2020-03-17 北京百度网讯科技有限公司 Method and device for structuring medical text
CN111858643A (en) * 2020-06-29 2020-10-30 上海森亿医疗科技有限公司 Database variable production method, system, computer device and storage medium
CN111858643B (en) * 2020-06-29 2021-11-16 上海森亿医疗科技有限公司 Database variable production method, system, computer device and storage medium
CN111951946A (en) * 2020-07-17 2020-11-17 合肥森亿智能科技有限公司 Operation scheduling system, method, storage medium and terminal based on deep learning
CN111951946B (en) * 2020-07-17 2023-11-07 合肥森亿智能科技有限公司 Deep learning-based operation scheduling system, method, storage medium and terminal
CN112669918A (en) * 2020-12-24 2021-04-16 上海市第一人民医院 Ophthalmic VEGF-related multidimensional clinical trial data processing method and system
CN112560494A (en) * 2020-12-24 2021-03-26 宝创瑞海(北京)科技发展有限公司 Method and system for integrating clinical diagnosis and treatment data
CN112712863A (en) * 2021-01-05 2021-04-27 中国人民解放军海军军医大学第一附属医院 Method and system for calculating clinical data of accurate drug administration for liver metastasis of colon cancer
CN114912887A (en) * 2022-04-20 2022-08-16 深圳市医未医疗科技有限公司 Clinical data entry method and device based on electronic medical record
CN115034204A (en) * 2022-05-12 2022-09-09 浙江大学 Method for generating structured medical text, computer device, storage medium and program product
CN116796718A (en) * 2023-06-13 2023-09-22 普瑞纯证医疗科技(广州)有限公司 Product specification generation method and system based on artificial intelligence generated content
CN116796718B (en) * 2023-06-13 2023-12-19 普瑞纯证医疗科技(广州)有限公司 Product specification generation method and system based on artificial intelligence generated content

Similar Documents

Publication Publication Date Title
CN107145511A (en) Structured medical data library generating method and system based on medical science text message
CN109766445B (en) Knowledge graph construction method and data processing device
CN107247881A (en) A kind of multi-modal intelligent analysis method and system
CN107833595A (en) Medical big data multicenter integration platform and method
CN108538395A (en) A kind of construction method of general medical disease that calls for specialized treatment data system
CN100449531C (en) Patient data mining
CN108628824A (en) A kind of entity recognition method based on Chinese electronic health record
US20060136259A1 (en) Multi-dimensional analysis of medical data
CN111048167B (en) Hierarchical case structuring method and system
WO2022267678A1 (en) Video consultation method and apparatus, device and storage medium
CN110459320A (en) A kind of assisting in diagnosis and treatment system of knowledge based map
CN109920540A (en) Construction method, device and the computer equipment of assisting in diagnosis and treatment decision system
CN108877921A (en) Medical intelligent diagnosis method and medical intelligent diagnosis system
WO2015079353A1 (en) System and method for correlation of pathology reports and radiology reports
CN113688255A (en) Knowledge graph construction method based on Chinese electronic medical record
CN111191456B (en) Method for identifying text segments by using sequence labels
CN109615012A (en) Medical data exception recognition methods, equipment and storage medium based on machine learning
CN106845058A (en) The standardized method of disease data and modular station
CN105190628A (en) Methods and apparatus for determining a clinician's intent to order an item
CN111191415A (en) Operation classification coding method based on original operation data
Pecoraro et al. Designing ETL tools to feed a data warehouse based on electronic healthcare record infrastructure
CN114330267A (en) Structural report template design method based on semantic association
CN114996388A (en) Intelligent matching method and system for diagnosis name standardization
CN114121295A (en) Construction method of knowledge graph driven liver cancer diagnosis and treatment scheme recommendation system
CN113886716B (en) Emergency disposal recommendation method and system for food safety emergencies

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20170908

RJ01 Rejection of invention patent application after publication