CN109785927A - Clinical document structuring processing method based on internet integration medical platform - Google Patents

Clinical document structuring processing method based on internet integration medical platform Download PDF

Info

Publication number
CN109785927A
CN109785927A CN201910101984.5A CN201910101984A CN109785927A CN 109785927 A CN109785927 A CN 109785927A CN 201910101984 A CN201910101984 A CN 201910101984A CN 109785927 A CN109785927 A CN 109785927A
Authority
CN
China
Prior art keywords
clinical
clinical document
sample
data
structuring processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910101984.5A
Other languages
Chinese (zh)
Inventor
高建强
赵戈
徐龙章
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Triman Information & Technology Co Ltd
Original Assignee
Shanghai Triman Information & Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Triman Information & Technology Co Ltd filed Critical Shanghai Triman Information & Technology Co Ltd
Priority to CN201910101984.5A priority Critical patent/CN109785927A/en
Publication of CN109785927A publication Critical patent/CN109785927A/en
Pending legal-status Critical Current

Links

Landscapes

  • Medical Treatment And Welfare Office Work (AREA)

Abstract

Present invention discloses a kind of clinical document structuring processing methods based on internet integration medical platform, it is related to internet medical platform technical field, unstructured clinical document is input to clinical document structuring processing engine, it is handled by means such as clinical medicine corpus, rule, full-text search and machine learning, it obtains structural data and is output to distributed storage engine, it is handled by intelligent algorithm, for Platform Analysis, is shown.The present invention is based on the clinical document structuring processing methods of internet integration medical platform, text data non-structured in clinical data is subjected to structuring processing, it stores in distributed Hadoop cluster, it realizes Distributed Storage mode and distributed computing processing, and the programming in software application is realized and is transformed and is adapted to for distributed nature.

Description

Clinical document structuring processing method based on internet integration medical platform
Technical field
The present invention relates to internet medical platform technical fields, more specifically refer to a kind of flat based on internet integration medical treatment The clinical document structuring processing method of platform.
Background technique
Big data penetrates into each industry and department, depth is answered as a kind of important resource to some extent With the business activities for not only facilitating constituent parts, it is also beneficial to push the development of national economy." internet+" is industry and information Change the achievement and mark of depth integration, and further promotes the important handgrip of information consumption.So-called " internet+" is exactly " mutually Networking+each traditional industries ", but this is not both simple addition, but utilize Information and Communication Technology and internet flat Platform allows internet and traditional industries to carry out depth integration, creates new developing ecology.Future Internet can also be made as electricity For a kind of productivity tool, being substantially improved for efficiency is brought to each industry.Push mobile Internet, cloud computing, big data, object Networking etc. promotes e-commerce, industry internet and the development of internet financial health, guides internet in conjunction with modern manufacturing industry Enterprise's Opening International Market." there is numerous electric business in traditional fairground+internet, such as traditional general merchandise is also produced accordingly to be sold Field+internet, traditional bank+internet, conventional traffic+internet." internet+" is just in overall application to the tertiary industry, shape At the new industry situation such as internet medical treatment, internet finance, Internet traffic, Internet education.
Medical industry is the important component of national economy and social development, and under the new situation, medical information is built Fast development have benefited from the application of the IT emerging technology such as big data, cloud computing and Internet of Things, caused the big of medical data Explosion, promotes the formation of medical big data.But there is a large amount of patient to check that survey report is (super in the clinical data of hospital Sound, X-ray, CT etc.), the non-structured text data such as pathological replacement, be unfavorable for internet unified platform and be analyzed and processed.
Summary of the invention
(1) the technical issues of solving
The invention aims to realize the structuring processing to non-structured notebook data in survey report is checked, provide A kind of clinical document structuring processing method based on internet integration medical platform.
(2) technical solution
Clinical document structuring processing method based on internet integration medical platform, includes the following steps,
S1, clinical document structuring processing engine receive the input of unstructured clinical document, pass through clinical medicine corpus The means such as library, rule, full-text search and machine learning convert non-structured text data in the sample and index of structuring Data;
S2, clinical document after structuring processing engine processing, obtained structural data i.e. sample and index Key-value pair is stored into distributed storage engine, analysis, displaying for platform.
An embodiment according to the present invention, the clinical document structuring processing engine include Chinese natural language processing mould Block, clinical medicine building of corpus module, sample index's extraction module, Chinese natural language processing module is respectively at clinical medicine Building of corpus module, sample index's extraction module are connected.
An embodiment according to the present invention, the Chinese natural language processing module handle skill using Chinese natural language Art is handled the clinical document of input from word, sentence, paragraph level, and processing step is as follows:
(1) cutting short sentence: according to the text narration feature of clinical document, using the display rule of sentence, by clinical document Content of text cutting be the short sentence for describing sample one by one;
(2) Chinese word segmentation: utilizing Chinese word segmentation tool, is based on general medicine dictionary and clinical medicine dictionary, short to sample Sentence is segmented, and significant word or phrase is obtained;
(3) part of speech is analyzed: analyzing the part of speech of each word;
(4) syntactic analysis: for specific sample short sentence, it is carried out with the short sentence for describing same sample in clinical document Compare, summary and induction goes out the short sentence syntax of every kind of pattern representation.
An embodiment according to the present invention, the clinical medicine building of corpus module are obtained by clinical document learning training The clinical medicine special term material library arrived, the construction step of the module are as follows:
(1) new word discovery: utilizing word frequency statistics, clustering method, be combined to participle, finds neologisms;
(2) synonym is found: for synonym present in clinical document, the content of text based on clinical document is utilized Fuzzy matching, statistical analysis means, obtain synonym, establish synonym table;
(3) sample extraction: in the short sentence of cutting, according to rule, sample name is extracted;
(4) template extraction: it is directed to specific sample, extracts the description template of the sample.
An embodiment according to the present invention, sample index's extraction module needle for each sample, by full-text search and The mode of fuzzy matching determines that index name is corresponding from clinical document in conjunction with clinical medicine corpus and pattern representation template Index value;The sample and index form key-value pair, the processing result output as structuring processing engine.
An embodiment according to the present invention, the clinical document include electronic health record, pathological replacement, check survey report.
An embodiment according to the present invention, the inspection survey report include ultrasound, X-ray, CT report.
An embodiment according to the present invention, the distributed storage engine is using the composition distribution of multiple standard server nodes Formula storage cluster, each Hadoop cluster include a host node and multiple slave nodes;Host node run NameNode and JobTracker function, and be responsible for coordinating slave node to ensure the completing to be supplied to cluster of the task;Slave node operation TaskTracker and HDFS for storing data has the function of to execute the mapping and abbreviation that data calculate.
(3) beneficial effect
Using technical solution of the present invention, the clinical document structuring processing side based on internet integration medical platform Method, unstructured clinical document are input to clinical document structuring processing engine, pass through clinical medicine corpus, rule, full text The means processing such as retrieval and machine learning, obtains structural data and is output to distributed storage engine, pass through intelligent algorithm It is handled, for Platform Analysis, is shown;The present invention carries out text data non-structured in clinical data at structuring Reason is stored into distributed Hadoop cluster, realizes Distributed Storage mode and distributed computing processing, and will be in software Programming in, which is realized, to be transformed and is adapted to for distributed nature.
Detailed description of the invention
In the present invention, identical appended drawing reference always shows identical feature, in which:
Fig. 1 is integrated medical platform general frame figure Internet-based.
Fig. 2 is flow chart of the present invention.
Fig. 3 is the integrated stand composition of clinical document structuring processing engine.
Fig. 4 is clinical document structuring processing engine internal structure chart.
Fig. 5 is Chinese natural language processing module structure chart.
Fig. 6 is clinical medicine building of corpus function structure chart.
Fig. 7 is Hadoop cluster network topological structure figure.
The architecture diagram of Fig. 8 distributed storage engine.
Specific embodiment
Technical solution of the present invention is further illustrated with reference to the accompanying drawings and examples.
Integration medical platform Internet-based combines medical big data and artificial intelligence technology, realizes based on " interconnection The integrated big data medical services platform of net+medical treatment ", for all participation health cares, movable personal and mechanism provides data The medical services of the online health care new model such as shared, business operation and cooperation with service, optimization information communication, advantageously promote Doctors and patients' information mutual communication facilitates service and management that hospital improves itself.Platform general frame is as shown in Figure 1, in platform under It is supreme to be respectively as follows: platform data basal layer, data analysis layer, medical information resource layer, data depth application layer and client layer etc. Five levels.Integration medical services platform Internet-based, including back-stage management end, doctor terminal and the big portion of patient end three Point.
Integrated medical services backstage management of platform end:
Back-stage management provides hospital HIS, the data exchanges such as PACS, LIS, RIS integration, medical information system medical data The functions such as backup.Mainly by data pick-up integration, medical data backup storage, special population database and anonymous public medical record number The composition such as inquiry according to library.
(1) data pick-up is integrated: completing the mistake of extraction, conversion and the load of the system datas such as HIS, RIS, LIS, PACS The clinical data that different clinic information systems generate is carried out unified integration and summarized, realized and suffer from different clinical information by journey The unification of person's mark and the unification of patient clinical information, make clinical data can unify storage.
A) HIS data extraction module, which is realized, registers, goes to a doctor, examines from HIS Emergency call and HIS system increment extraction of being hospitalized Break, doctor's advice, be admitted to hospital, the clinical datas such as expense;
B) RIS data extraction module realizes from RIS system increment synchronization audit report, position detail etc. and checks data;
C) LIS data extraction module is realized from LIS system increment synchronization survey report, test rating, bacterium and susceptibility Etc. inspection datas;
D) PACS module realizes the access from image documentation equipment such as DR, CT etc. the data for following DICOM3.0 consensus standard.
E) ETL subsystem is completed to operate desensitization, cleaning and conversion of clinical data etc..
Data desensitization: desensitizing for patient individual's sensitive data, and patient identity card number, medical card number, patient are personal Name etc. carries out specially treated, removes sensitive composition.
Data cleansing: incomplete data are abandoned;The data wrong for format, such as date of birth, pass through Other related datas are repaired, and can not repair, data are marked;
Data conversion: to the enumerated value for using numerical value or character to save in the system of source, the text of corresponding meaning is converted to.
(2) medical data backup storage: medical data backup center is the basis of clinical big data storage, for clinical big number Initial data source is provided according to processing, analysis.System is using distributed Hadoop cloud storage architecture, and for different medical, mechanism is provided The distributed storage ability of linear expansion, realize data storage filing, management and shared and all types medical institutions it Between information intercommunication, shared, achieve the purpose that the diversification storage and access of cloud computing platform.Medical data backup center is by curing The modules compositions such as treatment data bulk migration, medical data increment import, medical data is checked.
A) medical data bulk migration: use hadoop distributed structure/architecture, realize medical information system medical data by The monolithic backup that time carries out.
B) medical data increment imports: in the incremental mode of time series, the medical treatment imported in medical information system increases Measure data.
C) medical data is checked: being realized to kinds of Diseases, Gender, age bracket, department, audit report type and inspection Time etc. imports medical data and is inquired.
(3) it special population database: according to the patient clinical data of medical information system, establishes towards hyperthyroidism, glycosuria The special population database of the diseases such as disease, thyroid nodule, tumor of breast and thyroid tumors, can be to kinds of Diseases, patient Gender, age, inspection doctor, Index for examination and review time etc. inquire.
(4) state of an illness case and the doctor of the patients such as diabetes, thyroid disease anonymous public clinical record data base: can be checked Diagnosis and treatment suggestion, see a doctor to the patient of the similar state of an illness and reference be provided.In view of privacy, number is established using anonymous form for patient According to library.Kinds of Diseases, illness description content, doctor can be suggested in detail, check doctor, enquirement and time for replying etc. to look into It askes.
(5) model library: in order to which the model constructed using intelligent algorithm carries out classification forecast analysis, mould to medical diagnosis on disease The management of artificial intelligence model is mainly realized in type library, including importing, model training and model such as check at the functions.
(6) system administration: unified platform is mainly directed towards information centre, medical institutions administrative staff, doctor and patient etc. no With role, need scientifically to manage these users, lead to user management and role rights management, to it is various operation with Data access authority carries out stringent authorization and control.
Integrated medical services platform doctor terminal:
Integrated medical services platform doctor terminal is mainly that the medical personnel of medical institutions and researcher provide medicine Research and medical diagnosis aid decision provide platform, establish doctors and patients' channel of communication, check the medical advice of patient and for diagnosis Evaluation.Mainly by special population analysis, aided remote decision, patient advisory checks, evaluation of patient is checked etc. forms.
(1) special population is analyzed: to the clinical data for suffering from the special populations such as hyperthyroidism and diabetes in hospital information system Analysis mining is carried out disease research is provided and is provided and is for clinician and scientific research personnel to obtain occurrence regularity and inherent mechanism System is supported.
A) hyperthyroidism clinical data analysis excavates: hyperthyroidism clinical data includes the medical note of the Basic Information Table of patient, patient The clinical datas tables such as table, the medicining condition table of patient, the index test table of patient and the diagnosis situation table of patient are recorded, number is recorded Total amount about 2,000,000.Realize and data mining analysis carried out to the clinical data of hyperthyroidism disease, mainly from the essential information of patient, The themes such as test rating data information, doctor's advice medicining condition, complication situation, recurrence carry out.
B) diabetes clinical data analysis excavates: Basic Information Table of the diabetes clinical data comprising patient, patient are just The clinical datas tables such as record sheet, the medicining condition table of patient, the index test table of patient and the diagnosis situation table of patient are examined, are remembered Record number total amount about 1,000,000.It realizes and data mining analysis is carried out to the clinical data of hyperthyroidism disease, mainly from the basic letter of patient The themes such as breath, test rating data information, doctor's advice medicining condition, diagnosis situation carry out.
(2) aided remote decision: selection endocrine subject, the thyroid gland of cardiovascular subject and tumour subject, coronary heart disease and Research object of several diseases such as tumour as data collection and analysis relies on unified platform acquisition to integrate clinical treatment number According to realizing the medical diagnosis aid decision-making system towards thyroid nodule, coronary heart disease and tumor of breast etc., face for clinician Bed diagnosis and scientific research personnel's disease research provide system and support.Mainly by based on index parameter prediction module, based on check report Accuse four parts such as prediction module, model training module and the structurized module of text composition.
A) based on the prediction module of index parameter: according to the information such as patient's outpatient service serial number or medical insurance card number, Ke Yicha Ask the test rating and audit report text of the related disease of the patient.Structuring achievement data can be directly inputted;It is right In non-structured audit report, structuring is carried out using structuring submodule and obtains the data format that model can identify.
B the prediction module) based on audit report text:, can according to the information such as patient's outpatient service serial number or medical insurance card number To inquire the audit report text of the patient.For non-structured audit report text, deep learning algorithm is directly utilized It is predicted.
C) model training module: belonging to the basic module of system, invisible to user.By the thyroid gland knot of multiple databases The data such as the relevant clinical audit reports of the medical information systems such as section, coronary heart disease, mammary gland, test rating merge processing, collect At into unified tables of data, model training is carried out.
D) structurized module: realize that the structuring to ultrasonic report text data is handled, the ultrasound for extracting various samples is special Sign includes the index value of Tumor size, boundary, echo distribution, echo intensity etc. and each index, and forms retouching by each sample State template.Based on the template, the processing of the structuring to ultrasonic content of text is realized.
(3) patient advisory checks: it realizes doctor and conditions of patients diagnosis consulting content is checked, it can be according to disease kind Class, illness description content, review time etc. screening are checked.
(4) evaluation of patient is checked: it realizes doctor and evaluation of patient content is checked, it can be according to physician names, patient Name, evaluation content, evaluation time etc. screening are checked.
Integrated medical services platform patient end:
Integrated medical services platform patient end is the interface that patient logs in platform, and predominantly patient provides remotely cures on line Service is treated, evaluates service, the inquiry of Patients ' Electronic health account etc. after being mainly included in line consulting interrogation, medical treatment.Patient can lead to Online interrogation is crossed, the state of an illness tentative diagnosis result that artificial intelligence technology provides is obtained;, clothes horizontal by on-line evaluation doctor medical skill Attitude of being engaged in etc.;Diagnosis, inspection, inspection and image, doctor's advice, medical history, pathology and expense etc. are checked by Patients ' Electronic health account Data.It mainly include three modules: evaluation service, Patients ' Electronic health account after patient advisory's service, medical treatment.
(1) patient advisory services: realizing and provides online interrogation service for patient.Patient provides original state of an illness symptom and retouches It states, data, the system such as image check text report, test rating value obtain model energy using OCR identification facility, structured techniques The data format of identification examines unknown sample using the model that intelligent algorithm constructs by test rating signature analysis It is disconnected to carry out classification prediction, the state of an illness result for predicting the patient is finally showed into patient, including thyroid nodule type, thyroid gland Good pernicious, Breast Tumors of type of surgery, thyroid tumors etc. achieve the purpose that instruct patient's medical treatment and health care.System Be integrated with including convolutional neural networks (CNN), Recognition with Recurrent Neural Network (RNN), shot and long term memory unit recurrent neural network mould The intelligent algorithms such as type (LSTM), random forest, support vector machines, neural network, decision tree and K-means, construct Thyroid nodule and Breast Tumor Patients disease auxiliary diagnosis prediction model.
(2) service is evaluated after medical treatment: realizing rear evaluation of the patient to doctor's diagnosis and treatment process.Patient on the line of doctor to commenting Valence is a kind of effective doctor patient communication channel, is improved service quality for medical institutions and doctor, and gradually alleviating conflict between doctors and patients is There is great help.Doctor can be according to the evaluation and demand of patient come improvement, and medical institutions can be according to patient to doctor Overall evaluation situation give rewards and punishments appropriate.But the review number of single doctor may just have hundreds and thousands of in practice, Doctor's quantity of one medical institutions has several hundred or even thousands of, it will generates the evaluation of patient text information of magnanimity, manually Method needs to expend a large amount of energy to handle and analyze these information.System realizes the medical care evaluation body based on artificial intelligence System carries out emotional semantic analysis to evaluation of patient by machine, identifies front and unfavorable ratings automatically, count proportion. Doctor can quickly filter out unfavorable ratings, make improvement according to content;Medical institutions can be by department, doctor etc. to magnanimity Overall evaluation situation statistical analysis is carried out in evaluation information.
(3) Patients ' Electronic health account:
The clinical data for relying on Data Integration module to generate the different clinic information system such as HIS, PACS, LIS, RIS into Row and summarizes at unified integration, establish include patient essential information, diagnosis, inspection, inspection and image, doctor's advice, medical history, pathology With the personal electric health account unified view view of the data such as expense, it can be convenient patient and have access at any time, be diagnosis and treatment and scientific research Application is provided using clinical big data to support.
A) patient basis's dimension: name, gender, date of birth, passport NO., the contact method of main display patient Etc. essential informations;
B it) diagnoses dimension: showing all previous diagnosis records of patient etc.;
C it) examines dimension: showing all previous inspection record of patient in a tabular form;
D ultrasonic examination record and image of patient etc.) inspection and image dimension: are shown;
E) doctor's advice dimension: all kinds of doctor's advices of the record doctor to patient;
F) medical history dimension: the electronic health record record of patient;
G) pathology dimension: the pathology of patient is recorded;
H it) nurses dimension: showing the nursing record of patient, such as pulse, body temperature, blood pressure, breathing in graphical form;
I) physical examination dimension: display patient's physical examination record;
J) expense dimension: display statistics all kinds of expense details of patient.
With the increasingly increase of hospital data amount, hospital also enters big data era, sufficiently excavates the number that hospital generates According to can undoubtedly bring many valuable information.Hospital information initial stage, each clinic information system data separate local storage, Information Security is low, and data reliability cannot get effective guarantee.To hospital medical information system based on Hadoop framework Data are backed up, and are further sufficiently excavated the knowledge of medicinal data information behind, are provided support for Hospital Decision making, are improved The working efficiency of hospital establishes basis.
In conjunction with flow chart 2, the clinical document structuring processing method based on internet integration medical platform, including it is following Step:
S1, clinical document structuring processing engine receive the input of unstructured clinical document, pass through clinical medicine corpus The means such as library, rule, full-text search and machine learning convert non-structured text data in the sample and index of structuring Data;
S2, clinical document after structuring processing engine processing, obtained structural data i.e. sample and index Key-value pair is stored into distributed Hadoop cluster, analysis, displaying for platform.
Clinical document structuring handles the overall architecture of engine as shown in figure 3, clinical document structuring processing engine receives After the input of clinical document body of text content, two tasks are executed respectively: first, generating clinical medicine corpus includes clinic Medicine dictionary, synonym table and description template of clinical document etc.;Second, extracting sample index's key-value pair from clinical document As output.
As shown in figure 4, clinical document structuring handles engine mainly by Chinese natural language processing module, clinical medicine language Expect the big module composition of library building module, sample index's extraction module etc. three.The function of each module is as follows:
Chinese natural language processing module
Clinical document structuring handles the relevant technologies that engine utilizes Chinese natural language processing, from layers such as word, sentence, paragraphs The secondary clinical document to input is handled.Its function structure chart is as shown in Figure 5.It can be seen that the processing step of the module It is as follows:
(1) cutting short sentence: according to the text narration feature of clinical document, using the display rule of sentence, by clinical document Content of text cutting be the short sentence for describing sample one by one.Obtained each short sentence is substantially corresponding with certain sample. In other words, each short sentence substantially can be considered the description information of certain sample.
(2) Chinese word segmentation: utilizing Chinese word segmentation tool, is based on general medicine dictionary and clinical medicine dictionary, short to sample Sentence is segmented, and significant word or phrase is obtained.
(3) part of speech is analyzed: the part of speech of each word is analyzed, to help subsequent syntactic analysis and semantic understanding.
(4) syntactic analysis: for specific sample short sentence, it is carried out with the short sentence for describing same sample in clinical document Compare, summary and induction goes out the short sentence syntax of every kind of pattern representation.
Clinical medicine building of corpus module
Clinical medicine corpus is the clinical medicine special term material library obtained by clinical document learning training.This corpus More professional, accuracy is higher on Chinese word segmentation, is conducive to the Chinese natural language processing of clinical document.Clinical medicine corpus It is as shown in Figure 6 to construct modular structure.It can be seen that the construction step of the module is as follows:
(1) new word discovery: using the methods of word frequency statistics, cluster, participle is combined, finds neologisms.
(2) synonym is found: for synonym present in clinical document, when such as describing all diameters of lump, some texts Using " Zhou Jing ", some is using " diameter ".Content of text based on clinical document, using means such as fuzzy matching, statistical analysis, Synonym is obtained, synonym table is established.
(3) sample extraction: in the short sentence of cutting, according to rule, sample name is extracted.
(4) template extraction: it is directed to specific sample, extracts the description template of the sample.
Sample index's extraction module
Needle is retouched for each sample, by way of full-text search and fuzzy matching in conjunction with clinical medicine corpus and sample Template is stated, the corresponding index value of index name is determined from clinical document.Sample and index form key-value pair, handle as structuring The processing result of engine exports.
Family can be used without understanding the details of the distributed bottom layer in backup center based on Hadoop framework, utilizes The data that cluster carries out high speed are imported, are stored, and carry out data query.Text document is stored in collection in the form of multiple wave files In multiple nodes of group, the reliability of data storage will be increased.Hadoop distributed field system is used for medicinal data The bottom storage facility of system HDFS.The HDFS bottom storage that the present invention is implemented uses the master/slave framework of HDFS2.0. Hadoop system operates on Linux cluster.In the cluster, a computer manages other computers as master node, other Computer is responsible for data storage as slave node.The Hadoop of this system uses complete distributed mode, Hadoop cluster net Network topological structure is as shown in Figure 7.
Clinical document is after structuring processing engine processing, the key assignments of obtained structural data i.e. sample and index It is right, it stores into distributed Hadoop cluster, for the analysis of platform, shows.The storage engines are directed to the distribution of big data Characteristic realizes Distributed Storage mode and distributed computing processing using HDFS and MapReduce as core respectively, and will be Programming in software application, which is realized, to be transformed and is adapted to for distributed nature, really to play the excellent of distributed computing architecture Gesture.
Hadoop frame is " computing resource is moved to range data closer proximity ".The position Fig. 8 show distribution and deposits Store up the architecture diagram of engine.
Distributed storage engine based on Hadoop frame forms distributed storage collection using multiple standard server nodes Group.Each Hadoop cluster includes a host node and multiple slave nodes.Host node runs NameNode and JobTracker Function, and be responsible for coordinating slave node to ensure the completing to be supplied to cluster of the task.Slave node runs TaskTracker and use In the HDFS of storing data, have the function of to execute the mapping and abbreviation that data calculate.Actual data are stored in each back end On, it then calculates and occurs on the node that data are resident, Hadoop can be helped to provide than the higher property of storing data on network Energy.The high-performance that the combination of standard server platform and Hadoop infrastructure can provide economical and efficient for Data parallel application is flat Platform.
In conclusion using technical solution of the present invention, the clinical document knot based on internet integration medical platform Structure processing method, unstructured clinical document be input to clinical document structuring processing engine, by clinical medicine corpus, The processing of the means such as rule, full-text search and machine learning, obtains structural data and is output to distributed storage engine, by artificial Intelligent algorithm is handled, and for Platform Analysis, is shown;The present invention ties text data non-structured in clinical data Structureization processing, stores into distributed Hadoop cluster, realizes Distributed Storage mode and distributed computing processing, and will Programming in software application, which is realized, to be transformed and is adapted to for distributed nature.

Claims (8)

1. the clinical document structuring processing method based on internet integration medical platform, it is characterised in that: including following step Suddenly,
S1, clinical document structuring processing engine receive the input of unstructured clinical document, pass through clinical medicine corpus, rule Then, the means such as full-text search and machine learning convert non-structured text data in the sample and achievement data of structuring;
S2, clinical document are after structuring processing engine processing, the key assignments of obtained structural data i.e. sample and index It is right, it stores into distributed storage engine, analysis, displaying for platform.
2. special as described in claim 1 based on the clinical document structuring processing method of internet integration medical platform Sign is that the clinical document structuring processing engine includes Chinese natural language processing module, clinical medicine building of corpus Module, sample index's extraction module, Chinese natural language processing module refer to respectively at clinical medicine building of corpus module, sample Extraction module is marked to be connected.
3. special as claimed in claim 2 based on the clinical document structuring processing method of internet integration medical platform Sign is that the Chinese natural language processing module utilizes Chinese natural language processing technique, from word, sentence, paragraph level to defeated The clinical document entered is handled, and processing step is as follows:
(1) cutting short sentence: according to the text narration feature of clinical document, using the display rule of sentence, by the text of clinical document The cutting of this content is the short sentence for describing sample one by one;
(2) Chinese word segmentation: utilize Chinese word segmentation tool, be based on general medicine dictionary and clinical medicine dictionary, to sample short sentence into Row participle, obtains significant word or phrase;
(3) part of speech is analyzed: analyzing the part of speech of each word;
(4) syntactic analysis: for specific sample short sentence, it is compared with the short sentence for describing same sample in clinical document Compared with summary and induction goes out the short sentence syntax of every kind of pattern representation.
4. special as claimed in claim 2 based on the clinical document structuring processing method of internet integration medical platform Sign is that the clinical medicine building of corpus module is the clinical medicine special term material obtained by clinical document learning training The construction step in library, the module is as follows:
(1) new word discovery: utilizing word frequency statistics, clustering method, be combined to participle, finds neologisms;
(2) synonym is found: for synonym present in clinical document, the content of text based on clinical document, using fuzzy Matching, statistical analysis means, obtain synonym, establish synonym table;
(3) sample extraction: in the short sentence of cutting, according to rule, sample name is extracted;
(4) template extraction: it is directed to specific sample, extracts the description template of the sample.
5. special as claimed in claim 2 based on the clinical document structuring processing method of internet integration medical platform Sign is, sample index's extraction module needle is for each sample, by way of full-text search and fuzzy matching, in conjunction with clinic Medicine corpus and pattern representation template determine the corresponding index value of index name from clinical document;The sample and index shape Processing result output at key-value pair, as structuring processing engine.
6. special as described in claim 1 based on the clinical document structuring processing method of internet integration medical platform Sign is that the clinical document includes electronic health record, pathological replacement, checks survey report.
7. integrated stowage is transported in the grain harvesting based on image recognition as claimed in claim 6, which is characterized in that institute It states and checks that survey report includes ultrasound, X-ray, CT report.
8. special as described in claim 1 based on the clinical document structuring processing method of internet integration medical platform Sign is that the distributed storage engine forms distributed storage cluster, each Hadoop using multiple standard server nodes Cluster includes a host node and multiple slave nodes;Host node runs NameNode and JobTracker function, and is responsible for association Slave node is adjusted to ensure the completing to be supplied to cluster of the task;Slave node runs TaskTracker and for storing data HDFS has the function of to execute the mapping and abbreviation that data calculate.
CN201910101984.5A 2019-02-01 2019-02-01 Clinical document structuring processing method based on internet integration medical platform Pending CN109785927A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910101984.5A CN109785927A (en) 2019-02-01 2019-02-01 Clinical document structuring processing method based on internet integration medical platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910101984.5A CN109785927A (en) 2019-02-01 2019-02-01 Clinical document structuring processing method based on internet integration medical platform

Publications (1)

Publication Number Publication Date
CN109785927A true CN109785927A (en) 2019-05-21

Family

ID=66504122

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910101984.5A Pending CN109785927A (en) 2019-02-01 2019-02-01 Clinical document structuring processing method based on internet integration medical platform

Country Status (1)

Country Link
CN (1) CN109785927A (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110399450A (en) * 2019-06-20 2019-11-01 东华大学 A kind of Thyroid ultrasound report structure scan method based on semantic tree
CN110413963A (en) * 2019-07-03 2019-11-05 东华大学 Breast ultrasonography report structure method based on domain body
CN110853745A (en) * 2019-09-23 2020-02-28 陈翔 Skin disease patient standardization system
CN111476030A (en) * 2020-05-08 2020-07-31 中国科学院计算机网络信息中心 Prospective factor screening method based on deep learning
CN112687364A (en) * 2020-12-24 2021-04-20 宁波金唐软件有限公司 Hbase-based medical data management method and system
CN112948471A (en) * 2019-11-26 2021-06-11 广州知汇云科技有限公司 Clinical medical text post-structured processing platform and method
CN113380414A (en) * 2021-05-20 2021-09-10 心医国际数字医疗系统(大连)有限公司 Data acquisition method and system based on big data
CN113380380A (en) * 2021-06-23 2021-09-10 上海电子信息职业技术学院 Intelligent reading device for medical reports
CN113539414A (en) * 2021-07-30 2021-10-22 中电药明数据科技(成都)有限公司 Method and system for predicting rationality of antibiotic medication
CN114678132A (en) * 2022-02-22 2022-06-28 北京颐圣智能科技有限公司 Self-learning medical wind control system and method based on clinical behavior feedback
CN115617840A (en) * 2022-12-19 2023-01-17 江西曼荼罗软件有限公司 Medical data retrieval platform construction method, system, computer and storage medium
CN115757430A (en) * 2022-12-01 2023-03-07 武汉博科国泰信息技术有限公司 Data structured processing method and system for medical data

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150066537A1 (en) * 2013-09-05 2015-03-05 A-Life Medical, LLC. Automated clinical indicator recognition with natural language processing
CN108538395A (en) * 2018-04-02 2018-09-14 上海市儿童医院 A kind of construction method of general medical disease that calls for specialized treatment data system
CN109243616A (en) * 2018-06-29 2019-01-18 东华大学 Breast electronic medical record combined relation extraction and structuring system based on deep learning

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150066537A1 (en) * 2013-09-05 2015-03-05 A-Life Medical, LLC. Automated clinical indicator recognition with natural language processing
CN108538395A (en) * 2018-04-02 2018-09-14 上海市儿童医院 A kind of construction method of general medical disease that calls for specialized treatment data system
CN109243616A (en) * 2018-06-29 2019-01-18 东华大学 Breast electronic medical record combined relation extraction and structuring system based on deep learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
施志威: "基于云服务的临床文档结构化系统设计与实现", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
阮彤等: "基于电子病历的临床医疗大数据挖掘流程与方法", 《大数据》 *

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110399450A (en) * 2019-06-20 2019-11-01 东华大学 A kind of Thyroid ultrasound report structure scan method based on semantic tree
CN110399450B (en) * 2019-06-20 2023-06-23 东华大学 Thyroid ultrasound report structured scanning method based on semantic tree
CN110413963A (en) * 2019-07-03 2019-11-05 东华大学 Breast ultrasonography report structure method based on domain body
CN110413963B (en) * 2019-07-03 2022-11-25 东华大学 Breast ultrasonic examination report structuring method based on domain ontology
CN110853745A (en) * 2019-09-23 2020-02-28 陈翔 Skin disease patient standardization system
CN112948471A (en) * 2019-11-26 2021-06-11 广州知汇云科技有限公司 Clinical medical text post-structured processing platform and method
CN111476030B (en) * 2020-05-08 2022-03-15 中国科学院计算机网络信息中心 Prospective factor screening method based on deep learning
CN111476030A (en) * 2020-05-08 2020-07-31 中国科学院计算机网络信息中心 Prospective factor screening method based on deep learning
CN112687364A (en) * 2020-12-24 2021-04-20 宁波金唐软件有限公司 Hbase-based medical data management method and system
CN112687364B (en) * 2020-12-24 2023-08-01 宁波金唐软件有限公司 Medical data management method and system based on Hbase
CN113380414A (en) * 2021-05-20 2021-09-10 心医国际数字医疗系统(大连)有限公司 Data acquisition method and system based on big data
CN113380414B (en) * 2021-05-20 2023-11-10 心医国际数字医疗系统(大连)有限公司 Data acquisition method and system based on big data
CN113380380A (en) * 2021-06-23 2021-09-10 上海电子信息职业技术学院 Intelligent reading device for medical reports
CN113539414A (en) * 2021-07-30 2021-10-22 中电药明数据科技(成都)有限公司 Method and system for predicting rationality of antibiotic medication
CN114678132A (en) * 2022-02-22 2022-06-28 北京颐圣智能科技有限公司 Self-learning medical wind control system and method based on clinical behavior feedback
CN115757430A (en) * 2022-12-01 2023-03-07 武汉博科国泰信息技术有限公司 Data structured processing method and system for medical data
CN115617840B (en) * 2022-12-19 2023-03-10 江西曼荼罗软件有限公司 Medical data retrieval platform construction method, system, computer and storage medium
CN115617840A (en) * 2022-12-19 2023-01-17 江西曼荼罗软件有限公司 Medical data retrieval platform construction method, system, computer and storage medium

Similar Documents

Publication Publication Date Title
CN109785927A (en) Clinical document structuring processing method based on internet integration medical platform
CN110415831B (en) Medical big data cloud service analysis platform
CN109830303A (en) Clinical data mining analysis and aid decision-making method based on internet integration medical platform
US11232365B2 (en) Digital assistant platform
Chen et al. A parallel patient treatment time prediction algorithm and its applications in hospital queuing-recommendation in a big data environment
Puppala et al. METEOR: an enterprise health informatics environment to support evidence-based medicine
CN109841282A (en) A kind of Chinese medicine health control cloud system and its building method based on cloud computing
CN112349369A (en) Medical image big data intelligent analysis method, system and storage medium
Li et al. Automatic approach for constructing a knowledge graph of knee osteoarthritis in Chinese
CN115497631A (en) Clinical scientific research big data analysis system
Liu et al. Requirements engineering for health data analytics: Challenges and possible directions
Qu A review on the application of knowledge graph technology in the medical field
CN111460173A (en) Method for constructing disease ontology model of thyroid cancer
D'Auria et al. Improving graph embeddings via entity linking: a case study on Italian clinical notes
Sakib et al. A novel approach on machine learning based data warehousing for intelligent healthcare services
Chluski et al. The application of big data in the management of healthcare organizations. A review of selected practical solutions
Saranya et al. Intelligent medical data storage system using machine learning approach
Kumar Attar et al. The emergence of Natural Language Processing (NLP) techniques in healthcare AI
Jin et al. Research on the construction and application of breast cancer-specific database system based on full data lifecycle
Sathish Kumar et al. Information extraction and prediction using partial keyword combination and blends measure
Chondrogiannis et al. A novel approach for clinical data harmonization
Ambhaikar A survey on health care and expert system
Yanling et al. Research on entity recognition and knowledge graph construction based on TCM medical records
Alani et al. Big data analytics for healthcare organizations a case study of the Iraqi healthcare sector
Samra et al. Design of a clinical database to support research purposes: Challenges and solutions

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190521