CN107145511A - Structured medical data library generating method and system based on medical science text message - Google Patents
Structured medical data library generating method and system based on medical science text message Download PDFInfo
- Publication number
- CN107145511A CN107145511A CN201710208112.XA CN201710208112A CN107145511A CN 107145511 A CN107145511 A CN 107145511A CN 201710208112 A CN201710208112 A CN 201710208112A CN 107145511 A CN107145511 A CN 107145511A
- Authority
- CN
- China
- Prior art keywords
- medical science
- variable
- text message
- medical
- result
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2452—Query translation
- G06F16/24522—Translation of natural language queries to structured queries
Abstract
The invention discloses a kind of structured medical data library generating method based on medical science text message, including:Obtain the medical science text message of input;The corresponding natural semantic processes model of the medical science text message is determined, and the natural semantic analysis processing of depth is carried out to the medical science text message using the natural semantic processes model, result is obtained;The corresponding medical science variable of each processing data in the result is determined, and each processing data input is obtained into structured medical database to the correspondence position of correspondingly medical science variable;This method is capable of the automaticity and intelligence degree of lift structure medical data base generation, substantially reduces cost of labor, improves structured medical database formation efficiency;The invention discloses a kind of structured medical database creating system based on medical science text message, with above-mentioned beneficial effect.
Description
Technical field
The present invention relates to medical data processing technology field, more particularly to a kind of structuring doctor based on medical science text message
Learn data library generating method and system.
Background technology
The utilization of unstructured medical science text message is a big difficult point of technical field all the time.Prior art typically makes
Manually or semi-artificial mode handles medical science text message.Major part has the doctor of use demand or the practitioner of relevant industries
Using manual read's medical science text and do standardization typing by the way of handle non-structured history medical data (retrospective number
According to).Universal method substantially, is designed and the electronic structuring of programming realization by related personnel oneself or third party technology provider
Form (eCRF), then by related personnel's artificial naked eyes scan text data portionwise, knot is manually entered into after finding relevant information
In structure form.A small number of technologies can be realized to be won based on Keywords matching and the semi-automatic information of standard formulation, i.e., from text
Matching relative words or expression read information there is provided the convenient artificial naked eyes of complementary tool in this.
I.e. existing solution, which is largely relied on, possesses the artificial of professional knowledge, and its process is time-consuming and cost is huge.Entirely
The rare intelligent aid of process, even manual entry information are as the repetition of labor intensity and content is uninteresting and imitates
Rate is low.For example, it is desired to excavate all cancer patients from electronic health record and build database.Way is directly to search at this stage
" cancer " or " cancer ".Improved way is to arrange the related names of all cancers, and the Keywords matching from case is found out
Corresponding patient.But some are in particular cases, such as " lung cancer ", in " bottom right small cell carcinoma of lung ", (attention is herein pass with lung cancer
Keyword can not just be found), or in face of a kind of nonstandard form of presentation of cancer, such as abbreviation or clerical mistake, existing mode is with regard to nothing
Method processing.The deficiency of these technologies causes the accuracy and degree of recalling of whole information extraction to be all unable to reach expected effects.It is i.e. existing
Have in technology and there is processing procedure automation, intelligent deficiency, and cost of labor is high.
The content of the invention
It is an object of the invention to provide a kind of structured medical data library generating method based on medical science text message and it is
System, is capable of the automaticity and intelligence degree of lift structure medical data base generation, substantially reduces cost of labor, improve
Structured medical database formation efficiency.
In order to solve the above technical problems, the present invention provides a kind of structured medical database life based on medical science text message
Into method, methods described includes:
Obtain the medical science text message of input;
The corresponding natural semantic processes model of the medical science text message is determined, and utilizes the natural semantic processes model
The natural semantic analysis processing of depth is carried out to the medical science text message, result is obtained;
Determine the corresponding medical science variable of each processing data in the result, and by each processing data input to correspondence doctor
The correspondence position of variable is learned, structured medical database is obtained.
Optionally, the corresponding natural semantic processes model of the medical science text message is determined, including:
Extract the key message point of the medical science text message;
The corresponding medical science text categories of the medical science text message are determined according to the key message point;
Determine the corresponding natural semantic processes model of the medical science text categories.
Optionally, the natural semantic analysis of depth is carried out to the medical science text message using the natural semantic processes model
Processing, including:
Obtain the granularity threshold value of input;
Make the natural semantic processes model according to the granularity threshold value, depth is carried out to the medical science text message certainly
Right semantic analysis processing.
Optionally, after obtaining result, in addition to:
The corresponding result of specified medical science variable included in the result is entered using Medicine standard database
Row standardization mapping processing, obtains standardization result.
Optionally, the corresponding medical science variable of each processing data in the result is determined, including:
Determine the corresponding primary medical science variable of each processing data in the result;
The primary medical science variable is handled using artificial rules integration correction logic, obtained at primary medical science variable
Manage result;
When there is senior medical science variable in the primary medical science variable result, according to alignment processing data and institute
The corresponding logical relation of senior medical science variable is stated, senior medical science variable processing data is generated.
Optionally, after the medical science text message for obtaining input, in addition to:
Data desensitization process is carried out to the medical science text message.
The present invention also provides a kind of structured medical database creating system based on medical science text message, including:
Acquisition module, the medical science text message for obtaining input;
Natural semantic processes module, for determining the corresponding natural semantic processes model of the medical science text message, and profit
The natural semantic analysis processing of depth is carried out to the medical science text message with the natural semantic processes model, processing knot is obtained
Really;
Structured medical database generation module, for determining that the corresponding medical science of each processing data becomes in the result
Amount, and each processing data input is obtained into structured medical database to the correspondence position of correspondence medical science variable.
Optionally, the natural semantic processes module, including:
Granularity threshold value acquiring unit, the granularity threshold value for obtaining input;
Natural semantic processing unit, for making the natural semantic processes model according to the granularity threshold value, to described
Medical science text message carries out the natural semantic analysis processing of depth.
Optionally, this programme also includes:
Standardization module, for being become using Medicine standard database to the specified medical science included in the result
Measure corresponding result and be standardized mapping processing, obtain standardization result.
Optionally, the structured medical database generation module, including:
Primary medical science variable cell, for determining the corresponding primary medical science variable of each processing data in the result;
Amending unit, for being handled using artificial rules integration correction logic the primary medical science variable, is obtained
Primary medical science variable result;
Senior medical science variable cell, for when there is senior medical science variable in the primary medical science variable result,
According to alignment processing data and the corresponding logical relation of the senior medical science variable, senior medical science variable processing data is generated.
A kind of structured medical data library generating method based on medical science text message provided by the present invention, including:Obtain
Take the medical science text message of input;The corresponding natural semantic processes model of the medical science text message is determined, and described in certainly
Right semantic processes model carries out the natural semantic analysis of depth to the medical science text message and handled, and obtains result;Determine institute
State the corresponding medical science variable of each processing data in result, and by each processing data input to the corresponding position of correspondingly medical science variable
Put, obtain structured medical database;
It can be seen that, this method captures the medical science variable formation structure of medical science text message using nature semantic processes model automatically
Change medical data base, that is, be capable of the automaticity and intelligence degree of lift structure medical data base generation, substantially reduce
Cost of labor, improves structured medical database formation efficiency;The invention discloses a kind of structure based on medical science text message
Change medical data base generation system, with above-mentioned beneficial effect, will not be repeated here.
Brief description of the drawings
In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing
There is the accompanying drawing used required in technology description to be briefly described, it should be apparent that, drawings in the following description are only this
The embodiment of invention, for those of ordinary skill in the art, on the premise of not paying creative work, can also basis
The accompanying drawing of offer obtains other accompanying drawings.
The structured medical data library generating method based on medical science text message that Fig. 1 is provided by the embodiment of the present invention
Flow chart;
The exemplary plot for the natural semantic analysis processing of depth that Fig. 2 is provided by the embodiment of the present invention;
The structured medical database creating system based on medical science text message that Fig. 3 is provided by the embodiment of the present invention
Structured flowchart.
Embodiment
The core of the present invention is to provide a kind of structured medical data library generating method based on medical science text message and is
System, is capable of the automaticity and intelligence degree of lift structure medical data base generation, substantially reduces cost of labor, improve
Structured medical database formation efficiency.
To make the purpose, technical scheme and advantage of the embodiment of the present invention clearer, below in conjunction with the embodiment of the present invention
In accompanying drawing, the technical scheme in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is
A part of embodiment of the present invention, rather than whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art
The every other embodiment obtained under the premise of creative work is not made, belongs to the scope of protection of the invention.
Medical big data is very hot door and challenging field now.Compared to other various fields, medical industry
The challenge of mass data and unstructured data is already encountered, and many countries are all actively pushing forward medical information in recent years
Development, this many medical institutions to have fund to do big data analysis.However, medical big data is facing to lot of challenges.
First, substantial amounts of clinical data exists in non-structured text entry form, and this is provided to big data analysis
Acid test.For example in hospital clinical data it is the maximum amount of be also it is most valuable be patient electronic health record data.Electronics
Case history refers to the summation of the data such as word, symbol, chart, image, section that medical worker is formed during curative activity, bag
Include door (urgency) and diagnose a disease and go through and inpatient cases, electronic health record refers not only to static medical record information, includes the related service of offer.Be with
Relevant personal lifetime health state and the information of health care behavior that electronic mode is managed, be related to patient information collection,
All procedural informations for storing, transmit, handling and utilizing.So visible, the content that electronic health record is included, is that sufferer user is most complete
Whole, detailed clinical information resource.Just by the complexity that it includes information, it largely relies on the text of big section
Information is passed in description.Processing and analytical plan that nowadays the non-structured text message of this class does not almost automate.Scientific research
Personnel need to search for electronic health record data by keyword or combination condition.However, common natural language participle and syntax tree
Storehouse can destroy original professional meaning of a word, can so have a strong impact on search effect when the noun to medical speciality is handled.Cause
How this, be combined by natural language with medical speciality term, and the scientific and reasonable participle of formation and Parsing algorithm seem
It is particularly important.
Secondly, it is the shortage standard of medical data information.Hospital internal data inputting person is numerous now, and without architecture
Data inputting standard.Simultaneity factor module is numerous, data interface standard disunity.These reasons make data in the court all without
Method is effectively connected, as information island.With area medical resource information platform, the data value of single hospital is still very
It is limited, to realize distributing rationally for area medical resource, it is necessary to by area medical big data interconnection.This trend equally quilt
Shortage standard between hospital and seriously hinder.
Therefore, unstructured, off-gauge medical data is organically recognized as early as possible, uniformly, using would is that medical science is big
The extremely important technical barrier of data fields.The present embodiment be solve the problem and design automate, intelligent solution party
Case.I.e. the present embodiment can be read, understood by automation algorithm full section office medical science text message (including electronic health record, inspection
Observe and predict announcement etc.), the semanteme in it, and further structuring are analyzed using nature semantic processes model (i.e. intelligent algorithm)
The structured medical database that can recognize that as computer.Specifically it refer to Fig. 1, a kind of structuring based on medical science text message
Medical data library generating method, can specifically include:
S100, the medical science text message for obtaining input;
Wherein, the present embodiment does not limit the species of specific medical science text message.It for example can be electronic health record (serious disease
Go through text), operation record, iconography report and survey report etc..I.e. in the case where being normally applied scene, the medical science such as electronic health record text
This information can be exported from hospital information system and obtained.Its medical science text message overwhelming majority is unformatted txt after export
Form.
S110, determine the corresponding natural semantic processes model of medical science text message, and utilize nature semantic processes model pair
Medical science text message carries out the natural semantic analysis processing of depth, obtains result;
Wherein, the species of nature semantic processes model is not limited in the present embodiment, you can with only one natural language
Justice processing model, whole medical science text messages of corresponding acquisition all carry out the natural language of depth by the natural semantic processes model
Justice analyzing and processing.Can also be the species according to medical science text message, using it is corresponding with the species of medical science text message from
Right semantic processes model carries out the natural semantic analysis processing of depth.The natural semantic processes model only one of which that the former trains, because
This training process is simple.The latter is corresponding with the species of medical science text message due to natural semantic processes model, therefore training
Natural semantic processes Number of Models is more, and its corresponding identification and the accuracy extracted are also high.I.e. such as medical science text has following several
Big classification:Big case history text, operation record, iconography report, survey report etc..The each corresponding writing standard of classification and content
It is different.Thus, to reach higher identification and extracting the degree of accuracy, different medical science texts can be classified first, and
Special natural semantic processes model training is carried out for different texts.Wherein, natural semantic processes model acquisition process can
To be to integrate disclosed all Medical Dictionaries first, as far as possible comprising more standard medical vocabulary;Then each section office are collected true
Real case data, find the artificial participle of medical profession and mark disease;Then machine learning algorithm is used, based on artificial mark
As a result NLP models are trained.Finally, the international standard knowledge bases such as UMLS or SNOMEDCT and are corresponded to completely.
It is i.e. preferred, determine that the corresponding natural semantic processes model of medical science text message can include:
Extract the key message point of medical science text message;
The corresponding medical science text categories of medical science text message are determined according to key message point;
Determine the corresponding natural semantic processes model of medical science text categories.
Specifically, key message point is that can recognize the data of medical science text message of all categories in this preferred embodiment,
Such as all kinds of crucial medical concepts.It is for example in case history general to occur first time progress note, case history, the first content such as course of disease.
And the key message content such as some particular test data typically occurs in survey report.I.e. this preferred embodiment is not limited respectively
The key message point of medical science text under type, can be configured by user according to the actual features of all types of medical science texts and
Modification.Difference in special the present embodiment with nearly all maximum in the prior art is that the present embodiment uses machine
Study is handled plus the technology of natural language processing.Therefore Keywords matching will not be used as the side of identification important information
Method (because the degree of accuracy that keyword is recognized is not enough, more for example, wrongly writes the situation of word, some non-standard expression with regard to None- identified)
And the present embodiment, even if this word did not ran into, can also be identified by the ability with certain new word identification.That is this reality
Apply in example and to determine that the corresponding medical science text categories of medical science text message are relatively reliable by key message point, accurately, so as to improve
The accuracy of natural semantic processes model selection.
According to the characteristics of all types of medical science texts, the medical science variable extracted the need for its correspondence is determined, and according to this reality
The identification and extraction of existing customizing messages, therefore train according to the characteristics of all types of medical science texts corresponding natural semantic processes mould
Type, you can to obtain corresponding result according to the natural semantic processes model extraction.
Further, because the form for medical science text message its original fixation for obtaining input possibly can not be retained.This gives
Related medical data analysis brings great difficulty.The different fragments in medical science text thus can be recognized.Using case history as
Example is illustrated:For example be in case history can be comprising being admitted to hospital brief summary, progress note, discharge abstract, the piece of the different times such as operation record
Section.Therefore the corresponding natural language of case history text message can be made in order to further improve extraction accuracy and medical record information search precision
Justice processing model can each fragment in case history text message, and the natural semantic analysis processing of depth carried out to each fragment obtain each
The processing data of the distinctive medical science variable of section correspondence.The method combined for example with pattern match and deep learning, is obtained to recognize
The following information point of the case history text taken:The mark word of the homologous segment occurred in a, all kinds of case histories, such as:" the first time course of disease is remembered
Record ", " course of disease first ", " progress note first " etc.;Occur the context of specific fragment in b, case history text, including form, often
With grammer, common words etc..The algorithm integrated can be accurately by one whole section of text case history according to each class discrimination
Open, such issues that aid in following accurately case history screening, such as " patient with cancer when finding out discharge " to be accomplished by passing through
Screen the discharge abstract of all patients it is concluded that.The natural semantic analysis processing of depth is carried out respectively to each fragment realizing, is obtained
To the corresponding result of each fragment.
Further, for the medical science variable of more careful extraction acquisition medical science text message, it is also contemplated that not equal
The content difference of room or disease, is that each section office or disease build individually natural semantic processes model and can also further improved
The degree of accuracy.I.e. preferred, the present embodiment can use intelligent text sorting algorithm, can be by the Text region arbitrarily inputted into upper
State some classification of text categories and pointedly select the natural semantic processes model for being adapted to category depth analysis to carry out
Processing, to reach best treatment effect.
Wherein, the natural semantic analysis processing procedure of depth can include carrying out medical science text message successively word segmentation processing,
The operation such as part of speech analysis, Entity recognition, syntactic analysis, semantic analysis may finally realize the extraction of medical science variable.
Further, in order to which the personal information or sensitive information of protecting doctor, patient and hospital etc. are not disclosed.This is excellent
Select embodiment medical science text message can also be carried out data desensitization process (remove the organization names such as patient privacy information, hospital,
Worker informations such as doctor etc.) etc..
Fig. 2 is refer to, is the natural semantic analysis processing example figure of depth, it uses general natural semantic processes logic,
And the desensitization paid close attention in terms of incorporating medical science (removes the worker informations such as organization names, the doctors such as patient privacy information, hospital
Deng) etc. step, realize the depth customization semantic analysis for medical science text message.The analysis process can be by case history
Specialized vocabulary etc. and separated, the related part of speech analysis (for example, belonging to disease, symptom, index etc.) of progress, by it according to knowledge base
Carry out expression way normalization (for example " newborn lump " is changed into " lump in breast ", " more than 1 year " be changed into ">1 year (>1year)”).So
The association analysis of correlation is carried out afterwards, finds the relevance between extracted important information vocabulary.For example, such as one in case history
Word:" it was found that right newborn lump more than 1 year ", can be by after this step:" it was found that ", " right side ", " breast ", " lump ", " 1 year ", " remaining "
Deng segmenting words, then by system discovery, " right side " is to belong to " position (orientation) ", " lump in breast " is a class disease, " 1 year
It is remaining " it is a timing node.And " position " of lump in breast is right side, and the time of lump in breast is that " more than 1 year " specifically refer to
Fig. 2.
S120, the corresponding medical science variable of each processing data in result is determined, and by each processing data input to correspondingly
The correspondence position of medical science variable, obtains structured medical database.
Specifically, specific insert each processing data included in result in corresponding medical science variable of the step obtains
To structured medical database.The unstructured medical science text message that will be inputted is converted into the structured medical represented with variable
Database.I.e. the structured medical database can pass through each input medical science text message of list structure record.Here doctor
Learning variable can be preset by user.Its setting process can contemplate the corresponding medical science text of nature semantic processes model
The species of information, even its fragment come determine need extract medical science variable.Its traditional Chinese medicine variable can include time, disease
Disease, symptom, index etc..
After the structured medical database is formed, user can inquire about the medical science by typing medical science text message numbering
Text message, can also expect that the concrete numerical value of the medical science variable of search carries out database data inquiry by input, can be with
Correspondence medical science text data is carried out by input text type to inquire about, or the specific fragment under a certain type of input is corresponded to
Data query.
Based on above-mentioned technical proposal, the life of the structured medical database based on medical science text message that the embodiment of the present invention is carried
Into method, the medical science variable formation structured medical data of medical science text message are captured automatically using nature semantic processes model
Storehouse, that is, be capable of the automaticity and intelligence degree of lift structure medical data base generation, substantially reduce cost of labor, carry
Highly structured medical data base formation efficiency.
Database by artificial and regular generation is difficult flexible change, the change of accumulation before after addition or modification medical science variable
Amount can not auto-complete.The thus change of any database is intelligently directed to perspective data, retrospective data and in advance collects
Database is often abandoned.This feature request just must be thorough perfect when design database extracts model, yet with not
Different with the demand of personnel with project, the customer-oriented requirement of this model can be considerably complicated and frequent.Therefore, obtain in the prior art
Database flexibility ratio it is not enough, the structuring form (database) of generation is difficult to be modified and change.Therefore, based on above-mentioned reality
Example is applied, in order that the natural semantic analysis processing of depth is carried out to medical science text message using nature semantic processes model to wrap
Include:
Obtain the granularity threshold value of input;
Make nature semantic processes model according to granularity threshold value, medical science text message is carried out at the natural semantic analysis of depth
Reason.
Specifically, during the use of all kinds of different users, many times granularity of all kinds of people for medical science variable
Demand be inconsistent, such as wish that text case history is broken up completely in terms of certain structures typing, it is any be not specialty
The vocabulary of noun is all separated, so that it is screened.Some other clinical demands are then all contents extractions of having a medical check-up from case history
Come, or a word for representing MRI results is extracted.Therefore it is the demand for meeting this class, present embodiments provides tune
The whole possibility for extracting granularity (i.e. granularity threshold value).So, by the adjustment of granularity threshold value, different user can be by medical science
Text is broken up in different scale and (is broken into slag and is still broken into several pieces).For example, for the deciphering of noun in same medical science text message
There are a variety of modes.Such as " right lower abdominal pain ", one kind of pain can be divided into first, its position is specific again in Right Lower Abdomen in belly.This
Sample " right lower abdominal pain " can just be divided into " pain " in different medical worker's eyes, " stomachache ", " hypogastralgia ", " right lower abdominal pain "
Classification.Such deciphering rule can not be unified by some specific standard, and it may change with the change of research purpose.
Therefore, the present embodiment provides the method that user can be allowed to select participle granularity.Point used in basic embodiment
Word algorithm calculates the probability that each word is split out in medical science text message, thus the different granularity threshold value of correspondence can be by text
Originally it is cut into that block number is different, the different fragment of word length.For example, user such as need to split medical science text message as far as possible, it can
With by granularity threshold value set it is relatively low, so, once two words be not it is very clear and definite necessarily appear together, can all be separated
(granularity threshold value set it is minimum and text is cut into completely man combination).Conversely, as the same.With this, by participle threshold
The control of value, user can control the cutting degree of fragment.
Further, user can also be limited by the slit mode of higher level.It is i.e. preferred, in order that utilizing nature
Semantic processes model, which carries out the natural semantic analysis processing of depth to medical science text message, to be included:
Obtain the segmentation rules of input;
Make nature semantic processes model according to the segmentation rules, the natural semantic analysis of depth is carried out to medical science text message
Processing.
Specifically, as above " right lower abdominal pain " in example, wherein " bottom right " is identified as modifying " abdomen " this body part
The noun of locality, " abdomen " further modifies symptom " pain ".User can specify it is unified by body part and symptom carry out cutting (bottom right,
Abdomen, pain) or merge (bottom right, stomachache), or three can be combined (right lower abdominal pain), participle is precisely controlled with this
Granularity.
Because the standard degree of Chinese clinical unstructured text data is relatively low, literary style expression method is different, causes do not having
Have in the state of knowledge base is difficult to do the information extraction standardized.At this stage, the country also has no pervasive clinical data standards and (known
Know storehouse) extracted with auxiliary information with standardizing.There are many databases in each section office in broad terms.But these databases it
Between how to exchange with succession (time dimension and Spatial Dimension) with regard into problem.It is likely that taking time and effort the database done just
Being used in an article can not just reuse.Such as in different hospital, section office, the participation process of keyboarder case history
Writing expression is ever-changing, and resulting characteristic lack of standardization generates great difficulty for the data critical-path analysis in later stage.This
Embodiment can solve the problem of data normalization deficiency in database, data exchange and limited succession.I.e. based on above-mentioned any
It can also include in embodiment, the present embodiment:
Rower is entered to the corresponding result of specified medical science variable included in result using Medicine standard database
Standardization mapping is handled, and obtains standardization result.
Specifically, ambiguity and usage lack of standardization in order to solve input data, during medical science text structure, this
Embodiment can recognize that all kinds of off-gauge medical science are expressed and unified to international standards of medical education knowledge base.This feature ensures
All medical datas for flowing through algorithm, in spite of for same people's typing, whether from same system, whether from same doctor
Institute, can interconnect.The vocabulary that all will be seen is mapped to Medicine standard database.The process of this mapping will be all
Expression way occur in medical science text message, off-gauge is unified specific conceptive or will go out in result to some
Existing, off-gauge expression way is unified specific conceptive to some.So that can when similar expression will be run into future
Reference format is accurately unified into, and understands the meaning of its expression.Wherein Medicine standard database can include:
SNOMED-CT, ICD, HPO, UMLS etc., but it is not limited to this.A kind of a variety of usages of things can be mapped as
Unified normative term.
For example, the expression of Chinese medical content comes in every shape, may there are a variety of expression ways, example for some specified disease
Such as:Cerebral apoplexy, apoplexy are such.It can wherein be roughly divided into following and of all categories:Abbreviation (in/English), nonstandard expression, mistake book
Write.And these different are expressed in different texts the identical purpose that may represent.Now it is accomplished by represent same
The vocabulary of individual implication is standardized, and needs the result for ensureing this standardization to meet approved medical science both at home and abroad
Knowledge standard, such as the SNOMED-CT or the ICD of classification of diseases or the RxNorm of medicine of displaying symptom.
Based on above-mentioned any embodiment, determine that the corresponding medical science variable of each processing data can include in result:
Determine the corresponding primary medical science variable of each processing data in result;
Primary medical science variable is handled using artificial rules integration correction logic, primary medical science variable processing knot is obtained
Really;
When there is senior medical science variable in primary medical science variable result, according to alignment processing data and senior doctor
The corresponding logical relation of variable is learned, senior medical science variable processing data is generated.
Specifically, in medical domain, exist most basic (can be herein referred to as by the medical science variable being directly written in case history
For primary medical science variable or rudimentary medical science variable).Simultaneously there is also some need to integrate rudimentary variable could it is concluded that
Senior medical science variable (for example, the scoring of most of medical science needs to be formed according to several primary medical science variable conformity calculations).For
The demand that user calculates such senior medical science variable is met, artificial rules integration after natural language processing is present embodiments provided and repaiies
Positive function.Using this function, variable is mutually combined, added by the conclusion that user can be drawn based on early stage natural language processing
Plus logical relation, ultimately produce corresponding senior medical science variable.
For the primary medical science variable obtained in medical science text message, judge that it is by artificial rules integration correction logic
It is no to obtain senior medical science variable;If there is senior medical science variable, according to obtaining the corresponding primary of the senior medical science variable
The specific data of medical science variable, determine the processing data of the senior medical science variable.It is directed to various medical need, some medical science
Information needs to obtain and be further analyzed by specific logic judgment.For example, medical worker wishes to know whether hand
Art patient occurs in that some specific respiratory system complication symptoms, such as apnea, atelectasis in art.In this way, this becomes
Amount information just can not be obtained directly by case history text.On the contrary, its need after key message point in case history is identified by
Specific logic judgment is obtained.To obtain this class variable, the letter that system is first performed the operation patient by natural language processing engine
Breath is extracted, and " apnea ", " atelectasis " in operation record.By logic judgment, it can find and occur " exhaling in art
The situation of suction pause " AND/OR " atelectasis " symptom, new variables " intraoperative compliaction whether there is " is generated with this.
Further, in order to improve the adaptability of structured medical database, the actual demand of all types of user is better met,
User can fill in structuring list and generate the structured medical database being more consistent with self-demand.Wherein, here
Structuring list mainly fills in some structured medical databases generation processing rule by user.For example provide medical science text envelope
Breath is the source of data source, quantity etc., and output result is species of medical science variable etc., decimation rule of senior medical science variable etc.,
Whether need to be standardized data, and some other relevant letter is formed to final structure medical data base
Breath requirement etc..Content of the present embodiment not to specific structuring list is defined.
Further, because userbase can be very big, such as user is a hospital, then its one big structure of correspondence
Changing medical data base may occur that each data search is entered in the range of Quan Yuan during the use of each department of section office
OK, data search scope can be expanded., can be in the structure in order to further improve user structure medical data base service efficiency
Change in medical data base and set up project team, and the data source of gainer group, variable number etc., and then each project team can be formed
Corresponding structured medical database.User can also be that project team is managed at any time, for example, increase, and delete or change
Project team.So as to improve the service efficiency of structured medical database.
Based on above-mentioned technical proposal, the structured medical database provided in an embodiment of the present invention based on medical science text message
Generation method, is capable of the automaticity and intelligence degree of lift structure medical data base generation, substantially reduce it is artificial into
This, improves structured medical database formation efficiency, improves scheme flexibility ratio, and the structuring form (database) of generation is easy to repair
Just with change, variable data in medical science text message is standardized, to improve data exchange and succession ability.
The structured medical database creating system provided in an embodiment of the present invention based on medical science text message is entered below
Row is introduced, and the structured medical database creating system described below based on medical science text message is with above-described based on doctor
The structured medical data library generating method for learning text message can be mutually to should refer to.
It refer to Fig. 3, the structured medical database based on medical science text message that Fig. 3 is provided by the embodiment of the present invention
The structured flowchart of generation system;The system can include:
Acquisition module 100, the medical science text message for obtaining input;
Natural semantic processes module 200, for determining the corresponding natural semantic processes model of medical science text message, and is utilized
Natural semantic processes model carries out the natural semantic analysis of depth to medical science text message and handled, and obtains result;
Structured medical database generation module 300, for determining that the corresponding medical science of each processing data becomes in result
Amount, and each processing data input is obtained into structured medical database to the correspondence position of correspondence medical science variable.
Based on above-described embodiment, natural semantic processes module 200 can include:
Granularity threshold value acquiring unit, the granularity threshold value for obtaining input;
Natural semantic processing unit, for making nature semantic processes model according to granularity threshold value, to medical science text message
Carry out the natural semantic analysis processing of depth.
Based on above-mentioned any embodiment, the system also includes:
Standardization module, for using Medicine standard database to the specified medical science variable pair that is included in result
The result answered is standardized mapping processing, obtains standardization result.
Based on above-mentioned any embodiment, structured medical database generation module 300 can include:
Primary medical science variable cell, for determining the corresponding primary medical science variable of each processing data in result;
Amending unit, for being handled using artificial rules integration correction logic primary medical science variable, obtains primary
Medical science variable result;
Senior medical science variable cell, for when there is senior medical science variable in primary medical science variable result, according to
Alignment processing data and the corresponding logical relation of senior medical science variable, generate senior medical science variable processing data.
The embodiment of each in specification is described by the way of progressive, and what each embodiment was stressed is and other realities
Apply the difference of example, between each embodiment identical similar portion mutually referring to.For device disclosed in embodiment
Speech, because it is corresponded to the method disclosed in Example, so description is fairly simple, related part is referring to method part illustration
.
Professional further appreciates that, with reference to the unit of each example of the embodiments described herein description
And algorithm steps, can be realized with electronic hardware, computer software or the combination of the two, in order to clearly demonstrate hardware and
The interchangeability of software, generally describes the composition and step of each example according to function in the above description.These
Function is performed with hardware or software mode actually, depending on the application-specific and design constraint of technical scheme.Specialty
Technical staff can realize described function to each specific application using distinct methods, but this realization should not
Think beyond the scope of this invention.
Directly it can be held with reference to the step of the method or algorithm that the embodiments described herein is described with hardware, processor
Capable software module, or the two combination are implemented.Software module can be placed in random access memory (RAM), internal memory, read-only deposit
Reservoir (ROM), electrically programmable ROM, electrically erasable ROM, register, hard disk, moveable magnetic disc, CD-ROM or technology
In any other form of storage medium well known in field.
Above to structured medical data library generating method and system provided by the present invention based on medical science text message
It is described in detail.Specific case used herein is set forth to the principle and embodiment of the present invention, and the above is real
The explanation for applying example is only intended to the method and its core concept for helping to understand the present invention.It should be pointed out that for the art
For those of ordinary skill, under the premise without departing from the principles of the invention, some improvement and modification can also be carried out to the present invention,
These are improved and modification is also fallen into the protection domain of the claims in the present invention.
Claims (10)
1. a kind of structured medical data library generating method based on medical science text message, it is characterised in that methods described includes:
Obtain the medical science text message of input;
The corresponding natural semantic processes model of the medical science text message is determined, and using the natural semantic processes model to institute
State medical science text message and carry out the natural semantic analysis processing of depth, obtain result;
The corresponding medical science variable of each processing data in the result is determined, and each processing data input is arrived into correspondence medical science change
The correspondence position of amount, obtains structured medical database.
2. according to the method described in claim 1, it is characterised in that determine at the corresponding semanteme naturally of the medical science text message
Model is managed, including:
Extract the key message point of the medical science text message;
The corresponding medical science text categories of the medical science text message are determined according to the key message point;
Determine the corresponding natural semantic processes model of the medical science text categories.
3. method according to claim 1 or 2, it is characterised in that using the natural semantic processes model to the doctor
Learn text message and carry out the natural semantic analysis processing of depth, including:
Obtain the granularity threshold value of input;
Make the natural semantic processes model according to the granularity threshold value, the natural language of depth is carried out to the medical science text message
Justice analyzing and processing.
4. method according to claim 3, it is characterised in that after obtaining result, in addition to:
Rower is entered to the corresponding result of specified medical science variable included in the result using Medicine standard database
Standardization mapping is handled, and obtains standardization result.
5. method according to claim 4, it is characterised in that determine the corresponding doctor of each processing data in the result
Variable is learned, including:
Determine the corresponding primary medical science variable of each processing data in the result;
The primary medical science variable is handled using artificial rules integration correction logic, primary medical science variable processing knot is obtained
Really;
When there is senior medical science variable in the primary medical science variable result, according to alignment processing data and the height
The corresponding logical relation of level medical science variable, generates senior medical science variable processing data.
6. method according to claim 5, it is characterised in that after the medical science text message for obtaining input, in addition to:
Data desensitization process is carried out to the medical science text message.
7. a kind of structured medical database creating system based on medical science text message, it is characterised in that including:
Acquisition module, the medical science text message for obtaining input;
Natural semantic processes module, for determining the corresponding natural semantic processes model of the medical science text message, and utilizes institute
State nature semantic processes model and the natural semantic analysis processing of depth is carried out to the medical science text message, obtain result;
Structured medical database generation module, for determining the corresponding medical science variable of each processing data in the result,
And each processing data input is obtained into structured medical database to the correspondence position of correspondence medical science variable.
8. system according to claim 7, it is characterised in that the natural semantic processes module, including:
Granularity threshold value acquiring unit, the granularity threshold value for obtaining input;
Natural semantic processing unit, for making the natural semantic processes model according to the granularity threshold value, to the medical science
Text message carries out the natural semantic analysis processing of depth.
9. system according to claim 8, it is characterised in that also include:
Standardization module, for using Medicine standard database to the specified medical science variable pair that is included in the result
The result answered is standardized mapping processing, obtains standardization result.
10. system according to claim 9, it is characterised in that the structured medical database generation module, including:
Primary medical science variable cell, for determining the corresponding primary medical science variable of each processing data in the result;
Amending unit, for being handled using artificial rules integration correction logic the primary medical science variable, obtains primary
Medical science variable result;
Senior medical science variable cell, for when there is senior medical science variable in the primary medical science variable result, according to
Alignment processing data and the corresponding logical relation of the senior medical science variable, generate senior medical science variable processing data.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710208112.XA CN107145511A (en) | 2017-03-31 | 2017-03-31 | Structured medical data library generating method and system based on medical science text message |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710208112.XA CN107145511A (en) | 2017-03-31 | 2017-03-31 | Structured medical data library generating method and system based on medical science text message |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107145511A true CN107145511A (en) | 2017-09-08 |
Family
ID=59783900
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710208112.XA Pending CN107145511A (en) | 2017-03-31 | 2017-03-31 | Structured medical data library generating method and system based on medical science text message |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107145511A (en) |
Cited By (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107578798A (en) * | 2017-10-26 | 2018-01-12 | 北京康夫子科技有限公司 | The processing method and system of electronic health record |
CN108320808A (en) * | 2018-01-24 | 2018-07-24 | 龙马智芯(珠海横琴)科技有限公司 | Analysis of medical record method and apparatus, equipment, computer readable storage medium |
CN108711454A (en) * | 2018-06-29 | 2018-10-26 | 北京大学口腔医学院 | Removable partial denture design scheme generation method, equipment and medium |
CN108831562A (en) * | 2018-06-22 | 2018-11-16 | 北京海德康健信息科技有限公司 | A kind of disease name standard convention database and its method for building up |
CN108922633A (en) * | 2018-06-22 | 2018-11-30 | 北京海德康健信息科技有限公司 | A kind of disease name standard convention method and canonical system |
CN109448841A (en) * | 2018-11-09 | 2019-03-08 | 天津开心生活科技有限公司 | Establish data model method and device, clinical aid decision-making method and device |
CN109522413A (en) * | 2018-11-21 | 2019-03-26 | 上海依智医疗技术有限公司 | The construction method and device in a kind of hospital guide's medical terminology library |
CN110223783A (en) * | 2019-06-13 | 2019-09-10 | 上海明品医学数据科技有限公司 | A kind of control method in multiple terminal interaction medical datas |
CN110263176A (en) * | 2019-05-14 | 2019-09-20 | 武汉维特鲁威生物科技有限公司 | A kind of Medical data integration method and system based on ontology |
CN110289057A (en) * | 2018-03-19 | 2019-09-27 | 北京医联蓝卡在线科技有限公司 | A kind of voice consultation system and method |
CN110674244A (en) * | 2019-08-20 | 2020-01-10 | 南京医渡云医学技术有限公司 | Structured processing method and device for medical text |
CN110827988A (en) * | 2018-08-14 | 2020-02-21 | 上海明品医学数据科技有限公司 | Control method for medical data research based on mobile terminal |
CN110827989A (en) * | 2018-08-14 | 2020-02-21 | 上海明品医学数据科技有限公司 | Control method for processing medical data based on key factors |
CN110827945A (en) * | 2018-08-14 | 2020-02-21 | 上海明品医学数据科技有限公司 | Control method for generating key factors based on medical data |
CN110888926A (en) * | 2019-10-22 | 2020-03-17 | 北京百度网讯科技有限公司 | Method and device for structuring medical text |
CN111858643A (en) * | 2020-06-29 | 2020-10-30 | 上海森亿医疗科技有限公司 | Database variable production method, system, computer device and storage medium |
CN111951946A (en) * | 2020-07-17 | 2020-11-17 | 合肥森亿智能科技有限公司 | Operation scheduling system, method, storage medium and terminal based on deep learning |
CN112560494A (en) * | 2020-12-24 | 2021-03-26 | 宝创瑞海(北京)科技发展有限公司 | Method and system for integrating clinical diagnosis and treatment data |
CN112669918A (en) * | 2020-12-24 | 2021-04-16 | 上海市第一人民医院 | Ophthalmic VEGF-related multidimensional clinical trial data processing method and system |
CN112712863A (en) * | 2021-01-05 | 2021-04-27 | 中国人民解放军海军军医大学第一附属医院 | Method and system for calculating clinical data of accurate drug administration for liver metastasis of colon cancer |
CN114912887A (en) * | 2022-04-20 | 2022-08-16 | 深圳市医未医疗科技有限公司 | Clinical data entry method and device based on electronic medical record |
CN115034204A (en) * | 2022-05-12 | 2022-09-09 | 浙江大学 | Method for generating structured medical text, computer device, storage medium and program product |
CN116796718A (en) * | 2023-06-13 | 2023-09-22 | 普瑞纯证医疗科技(广州)有限公司 | Product specification generation method and system based on artificial intelligence generated content |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102902831A (en) * | 2011-07-25 | 2013-01-30 | 上海宝信软件股份有限公司 | Analytical method of assay statistical data |
US20150120733A1 (en) * | 2013-10-29 | 2015-04-30 | Google Inc. | Systems and methods for improved coverage of input media in content summarization |
CN104978587A (en) * | 2015-07-13 | 2015-10-14 | 北京工业大学 | Entity-identification cooperative learning algorithm based on document type |
CN106095913A (en) * | 2016-06-08 | 2016-11-09 | 广州同构医疗科技有限公司 | A kind of electronic health record text structure method |
CN106484674A (en) * | 2016-09-20 | 2017-03-08 | 北京工业大学 | A kind of Chinese electronic health record concept extraction method based on deep learning |
-
2017
- 2017-03-31 CN CN201710208112.XA patent/CN107145511A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102902831A (en) * | 2011-07-25 | 2013-01-30 | 上海宝信软件股份有限公司 | Analytical method of assay statistical data |
US20150120733A1 (en) * | 2013-10-29 | 2015-04-30 | Google Inc. | Systems and methods for improved coverage of input media in content summarization |
CN104978587A (en) * | 2015-07-13 | 2015-10-14 | 北京工业大学 | Entity-identification cooperative learning algorithm based on document type |
CN106095913A (en) * | 2016-06-08 | 2016-11-09 | 广州同构医疗科技有限公司 | A kind of electronic health record text structure method |
CN106484674A (en) * | 2016-09-20 | 2017-03-08 | 北京工业大学 | A kind of Chinese electronic health record concept extraction method based on deep learning |
Cited By (30)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107578798A (en) * | 2017-10-26 | 2018-01-12 | 北京康夫子科技有限公司 | The processing method and system of electronic health record |
CN108320808A (en) * | 2018-01-24 | 2018-07-24 | 龙马智芯(珠海横琴)科技有限公司 | Analysis of medical record method and apparatus, equipment, computer readable storage medium |
CN110289057A (en) * | 2018-03-19 | 2019-09-27 | 北京医联蓝卡在线科技有限公司 | A kind of voice consultation system and method |
CN108831562A (en) * | 2018-06-22 | 2018-11-16 | 北京海德康健信息科技有限公司 | A kind of disease name standard convention database and its method for building up |
CN108922633A (en) * | 2018-06-22 | 2018-11-30 | 北京海德康健信息科技有限公司 | A kind of disease name standard convention method and canonical system |
CN108711454A (en) * | 2018-06-29 | 2018-10-26 | 北京大学口腔医学院 | Removable partial denture design scheme generation method, equipment and medium |
CN110827945B (en) * | 2018-08-14 | 2022-05-27 | 上海明品医学数据科技有限公司 | Control method for generating key factors based on medical data |
CN110827989B (en) * | 2018-08-14 | 2022-07-12 | 上海明品医学数据科技有限公司 | Control method for processing medical data based on key factors |
CN110827988A (en) * | 2018-08-14 | 2020-02-21 | 上海明品医学数据科技有限公司 | Control method for medical data research based on mobile terminal |
CN110827989A (en) * | 2018-08-14 | 2020-02-21 | 上海明品医学数据科技有限公司 | Control method for processing medical data based on key factors |
CN110827945A (en) * | 2018-08-14 | 2020-02-21 | 上海明品医学数据科技有限公司 | Control method for generating key factors based on medical data |
CN109448841A (en) * | 2018-11-09 | 2019-03-08 | 天津开心生活科技有限公司 | Establish data model method and device, clinical aid decision-making method and device |
CN109522413A (en) * | 2018-11-21 | 2019-03-26 | 上海依智医疗技术有限公司 | The construction method and device in a kind of hospital guide's medical terminology library |
CN110263176A (en) * | 2019-05-14 | 2019-09-20 | 武汉维特鲁威生物科技有限公司 | A kind of Medical data integration method and system based on ontology |
CN110223783A (en) * | 2019-06-13 | 2019-09-10 | 上海明品医学数据科技有限公司 | A kind of control method in multiple terminal interaction medical datas |
CN110223783B (en) * | 2019-06-13 | 2023-08-18 | 上海明品医学数据科技有限公司 | Control method for interaction of medical data at multiple terminals |
CN110674244A (en) * | 2019-08-20 | 2020-01-10 | 南京医渡云医学技术有限公司 | Structured processing method and device for medical text |
CN110888926B (en) * | 2019-10-22 | 2022-10-28 | 北京百度网讯科技有限公司 | Method and device for structuring medical text |
CN110888926A (en) * | 2019-10-22 | 2020-03-17 | 北京百度网讯科技有限公司 | Method and device for structuring medical text |
CN111858643A (en) * | 2020-06-29 | 2020-10-30 | 上海森亿医疗科技有限公司 | Database variable production method, system, computer device and storage medium |
CN111858643B (en) * | 2020-06-29 | 2021-11-16 | 上海森亿医疗科技有限公司 | Database variable production method, system, computer device and storage medium |
CN111951946A (en) * | 2020-07-17 | 2020-11-17 | 合肥森亿智能科技有限公司 | Operation scheduling system, method, storage medium and terminal based on deep learning |
CN111951946B (en) * | 2020-07-17 | 2023-11-07 | 合肥森亿智能科技有限公司 | Deep learning-based operation scheduling system, method, storage medium and terminal |
CN112669918A (en) * | 2020-12-24 | 2021-04-16 | 上海市第一人民医院 | Ophthalmic VEGF-related multidimensional clinical trial data processing method and system |
CN112560494A (en) * | 2020-12-24 | 2021-03-26 | 宝创瑞海(北京)科技发展有限公司 | Method and system for integrating clinical diagnosis and treatment data |
CN112712863A (en) * | 2021-01-05 | 2021-04-27 | 中国人民解放军海军军医大学第一附属医院 | Method and system for calculating clinical data of accurate drug administration for liver metastasis of colon cancer |
CN114912887A (en) * | 2022-04-20 | 2022-08-16 | 深圳市医未医疗科技有限公司 | Clinical data entry method and device based on electronic medical record |
CN115034204A (en) * | 2022-05-12 | 2022-09-09 | 浙江大学 | Method for generating structured medical text, computer device, storage medium and program product |
CN116796718A (en) * | 2023-06-13 | 2023-09-22 | 普瑞纯证医疗科技(广州)有限公司 | Product specification generation method and system based on artificial intelligence generated content |
CN116796718B (en) * | 2023-06-13 | 2023-12-19 | 普瑞纯证医疗科技(广州)有限公司 | Product specification generation method and system based on artificial intelligence generated content |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107145511A (en) | Structured medical data library generating method and system based on medical science text message | |
CN109766445B (en) | Knowledge graph construction method and data processing device | |
CN107247881A (en) | A kind of multi-modal intelligent analysis method and system | |
CN107833595A (en) | Medical big data multicenter integration platform and method | |
CN108538395A (en) | A kind of construction method of general medical disease that calls for specialized treatment data system | |
CN100449531C (en) | Patient data mining | |
CN108628824A (en) | A kind of entity recognition method based on Chinese electronic health record | |
US20060136259A1 (en) | Multi-dimensional analysis of medical data | |
CN111048167B (en) | Hierarchical case structuring method and system | |
WO2022267678A1 (en) | Video consultation method and apparatus, device and storage medium | |
CN110459320A (en) | A kind of assisting in diagnosis and treatment system of knowledge based map | |
CN109920540A (en) | Construction method, device and the computer equipment of assisting in diagnosis and treatment decision system | |
CN108877921A (en) | Medical intelligent diagnosis method and medical intelligent diagnosis system | |
WO2015079353A1 (en) | System and method for correlation of pathology reports and radiology reports | |
CN113688255A (en) | Knowledge graph construction method based on Chinese electronic medical record | |
CN111191456B (en) | Method for identifying text segments by using sequence labels | |
CN109615012A (en) | Medical data exception recognition methods, equipment and storage medium based on machine learning | |
CN106845058A (en) | The standardized method of disease data and modular station | |
CN105190628A (en) | Methods and apparatus for determining a clinician's intent to order an item | |
CN111191415A (en) | Operation classification coding method based on original operation data | |
Pecoraro et al. | Designing ETL tools to feed a data warehouse based on electronic healthcare record infrastructure | |
CN114330267A (en) | Structural report template design method based on semantic association | |
CN114996388A (en) | Intelligent matching method and system for diagnosis name standardization | |
CN114121295A (en) | Construction method of knowledge graph driven liver cancer diagnosis and treatment scheme recommendation system | |
CN113886716B (en) | Emergency disposal recommendation method and system for food safety emergencies |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170908 |
|
RJ01 | Rejection of invention patent application after publication |