CN107729392A - Text structure method, apparatus, system and non-volatile memory medium - Google Patents
Text structure method, apparatus, system and non-volatile memory medium Download PDFInfo
- Publication number
- CN107729392A CN107729392A CN201710852183.3A CN201710852183A CN107729392A CN 107729392 A CN107729392 A CN 107729392A CN 201710852183 A CN201710852183 A CN 201710852183A CN 107729392 A CN107729392 A CN 107729392A
- Authority
- CN
- China
- Prior art keywords
- text
- structured
- subordinate sentence
- result
- entry
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/258—Data format conversion from or to a database
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/103—Formatting, i.e. changing of presentation of documents
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Machine Translation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention relates to a kind of text structure method, apparatus, system and non-volatile memory medium, this method includes:Non-structured text is obtained, non-structured text is pre-processed, and pretreated non-structured text is resolved into multiple subordinate sentences;Obtain the Q & A database of the structuring entry and counter structure entry in structured text;The problem of according in Q & A database, puts question to, and the content of subordinate sentence is matched into corresponding structuring entry respectively, to obtain subordinate sentence structured result;According to subordinate sentence structured result, structured text is obtained.Text structure method, apparatus, system and non-volatile memory medium provided by the invention, with reference to Q & A database, non-structured text message can be fully converted to structured message, changing effect is good, accuracy rate is high, and subordinate sentence structuring processing is carried out by two LSTM networks, expression way various in free text can be handled, there is good robustness.
Description
Technical field
The present invention relates to natural language processing technique field, be specifically designed a kind of text structure method, apparatus, system and
Non-volatile memory medium.
Background technology
Structuring refers to that the information that text is included is decomposed into multiple parts that are mutually related after analysis, respectively
There is clear and definite hierarchical structure between part.And text structureization then refers to non-structured text being converted into structuring text
This, with the expression way by structuring (such as project formula, form, structure chart, flow chart) make the expression of information it is more objective,
Vividly.
Nowadays, many free texts can be produced in big data epoch, particularly medical technology, and growing medical treatment
Text data brings brand-new challenge to whole medical industry:Doctor carries out diagnosis and treatment to patient, is had in diagnosis and treatment process substantial amounts of
Medical text generation.Wherein, most medical text datas belongs to semi-structured or unstructured data.By by half hitch
Structure or non-structured medical text data are converted into the structural data that computer can be analyzed and handled, and can be answered in scientific research
Made breakthroughs with, clinic diagnosis, data sharing with propagating etc..
And traditional medical text structure processing method is substantially doctor according to medicinal experience to medical text
Data carry out artificial treatment.However, the mode of this medical text structureization processing is not only lost time and energy, and structure
The accuracy rate for changing processing is unable to reach expected requirement.
To realize the transition problem of non-structured text, Chinese invention patent document CN03124897 discloses one kind and is used for
Make the method and apparatus of text structure, this method comprises the following steps, input structureization rule;Obtain non-structured text letter
Breath;Syntactic analysis is carried out to non-structured text information, produces small text fragments;From the text list of non-structured text information
The text fragments defined in structuring rule are found in member;Non-structured text is believed according to the condition determined in structuring rule
The text fragments of breath carry out structuring.The device includes, the input unit for non-structured text information;Advised for structuring
Input unit and storage device then;For extracting the extraction element of small text unit from non-structured text information;For
The structurizer of structured text information is produced according to structuring rule;With for the text unit in structured text information
Processing unit.
Although the method and apparatus for enabling text structure that the patent document provides realize non-structured text to knot
The conversion of structure text, it is distinctly understood that its transformation efficiency is poor, conversion accuracy rate also allows of no optimist.
And for example Chinese invention patent application document CN201610405133 discloses a kind of electronic health record text structure side
Method, this method comprise the following steps:S1, it is loaded into medical knowledge base;S2, read in electronic health record text;S3, utilize positive maximum
Short sentence is segmented with algorithm, obtains word and its part of speech, relative position relation in sentence;S4, judge in short sentence to disease
Sick information describes semantic positive and negative;S5, extraction disease information element;S6, repeat step S2 to S5, until obtaining in electronic health record
Whole content of interest;S7, the different expression for merging disease information element, according to medical science synonym dictionary, by identical disease
Sick information merges, and removes redundancy;S8, the element of disease description information stored in the form of structure/class, complete structure
Change process.
Although the structural method that the patent document provides can effectively extract disease from the descriptive text of case history
Relevant information, the structuring expression to disease information is formed, so as to the occurrence regularity to disease, makes a definite diagnosis mode, therapeutic effect etc.
Carry out deep layer exploration.But similarly, the transformation efficiency of the structural method is poor, and accuracy rate is low.
In summary, the transformation efficiency and accuracy rate for how improving text structure urgently solve as those skilled in the art
One of certainly the problem of.
The content of the invention
In order to solve the above problems, it is an object of the invention to provide the text knot that a kind of transformation efficiency is good, accuracy rate is high
Structure method, apparatus, system and non-volatile memory medium.
To achieve the above object, one aspect of the present invention provides a kind of text structure method, wherein, this method includes:
Non-structured text is obtained, non-structured text is pre-processed, and by pretreated non-structured text
Resolve into multiple subordinate sentences;
Obtain the Q & A database of the structuring entry and counter structure entry in structured text;
The problem of according in Q & A database, is putd question to, and the content of subordinate sentence is matched into corresponding structuring entry respectively, with
Obtain subordinate sentence structured result;
According to subordinate sentence structured result, structured text is obtained.
Further, pretreatment includes:Numeral in non-structured text and additional character are substituted for unified symbol.
Preferably, the Q & A database of the structuring entry in structured text and counter structure entry is obtained, including:
Classification processing is carried out to structuring entry, obtains classification results;
Question template is set respectively for classification results, and the question and answer number of counter structure entry is formed according to question template
According to storehouse.
Further, according in Q & A database the problem of, is putd question to, and the content of subordinate sentence is matched into corresponding structure respectively
Change entry, to obtain subordinate sentence structured result, including:
Word segmentation processing is carried out to subordinate sentence, and obtained subordinate sentence word segmentation result is inputted to the first LSTM networks, the first LSTM
Network carries out the first decoding process to subordinate sentence word segmentation result, to obtain subordinate sentence decoded result;
The problem of corresponding subordinate sentence is generated based on Q & A database, and word segmentation processing is carried out to problem, by the problem of obtaining points
Word result is inputted to the 2nd LSTM networks, and the 2nd LSTM networks carry out the second decoding process to problem word segmentation result, to be asked
Inscribe decoded result;
First LSTM networks are combined and matched somebody with somebody according to subordinate sentence decoded result and problem decoded result with the 2nd LSTM networks
To, modeling, so as to obtain subordinate sentence structured result.
Preferably, according to subordinate sentence structured result, structured text is obtained, including:
Merge multiple subordinate sentence structured results, obtain paragraph structure result;
Paragraph structure result is post-processed, obtains structured text.
Further, according to subordinate sentence structured result, after obtaining structured text, in addition to:
Structured text is changed into vector and is stored in result database, and by the vector and result data of structured text
Other vectors stored in storehouse carry out similarity system design, to obtain the similitude text of structured text;
Calculate the similarity between structured text and similitude text.
Another aspect of the present invention additionally provides a kind of text structure makeup and put, including:
Pretreatment module, for obtaining non-structured text, non-structured text is pre-processed, and by after pretreatment
Non-structured text resolve into multiple subordinate sentences;
Entry acquisition module, for obtaining the question and answer number of the structuring entry in structured text and counter structure entry
According to storehouse;
Subordinate sentence structurized module, put question to the problem of for according in Q & A database, the content of subordinate sentence is matched to respectively
Corresponding structuring entry, to obtain subordinate sentence structured result;
Text forms module, for according to subordinate sentence structured result, obtaining structured text.
Further, pretreatment module, it is additionally operable to the numeral in non-structured text and additional character being substituted for unification
Symbol.
Preferably, entry acquisition module includes:
Sort module, for carrying out classification processing to structuring entry, obtain classification results;
Module is formed, sets question template for classification results, and counter structure entry is formed according to question template
Q & A database.
Further, subordinate sentence structurized module includes:
Subordinate sentence processing module, for carrying out word segmentation processing to subordinate sentence, and obtained subordinate sentence word segmentation result is inputted to first
LSTM networks, the first LSTM networks carry out the first decoding process to subordinate sentence word segmentation result, to obtain subordinate sentence decoded result;
Issue handling module, for generating the problem of corresponding to subordinate sentence based on Q & A database, and problem is carried out at participle
Reason, word segmentation result the problem of obtaining is inputted to the 2nd LSTM networks, the 2nd LSTM networks second is carried out to problem word segmentation result
Decoding process, to obtain problem decoded result;
Combine matching module, for by the first LSTM networks and the 2nd LSTM networks foundation subordinate sentence decoded result and solution
Code result is combined pairing, modeling, obtains subordinate sentence structured result.
Preferably, structured text obtains module and included:
Merging module, for merging multiple subordinate sentence structured results, obtain paragraph structure result;
Post-processing module, for being post-processed to paragraph structure result, obtain structured text.
Further, text structure makeup is put and also included:
Similitude judge module, result database is stored in for structured text to be changed into vector, and by structuring
The vector of text carries out similarity system design with other vectors stored in result database, to obtain the similitude of structured text
Text;
Similarity calculation module, for calculating the similarity between structured text and similitude text.
Further aspect of the present invention additionally provides a kind of text structure system, including the makeup of foregoing text structure is put.
Another aspect of the invention additionally provides a kind of non-volatile memory medium, is stored with text structure on a storage medium
Change program, text structure program is computer-executed to implement text structure method, including:
A is instructed, non-structured text is obtained, non-structured text is pre-processed, and will be pretreated non-structural
Change text and resolve into multiple subordinate sentences;
B is instructed, obtains the Q & A database of the structuring entry and counter structure entry in structured text;
Instruct c, put question to the problem of according in Q & A database, by the content of subordinate sentence be matched to respectively corresponding to structuring bar
Mesh, to obtain subordinate sentence structured result;
D is instructed, according to subordinate sentence structured result, obtains structured text.
As above, text structure method, apparatus, system and non-volatile memory medium provided by the present invention, with reference to asking
Database is answered, non-structured text message can be fully converted to structured message, changing effect is good, and accuracy rate is high, and leads to
Cross two LSTM networks and carry out subordinate sentence structuring processing, expression way various in free text can be handled, there is good be good for
Strong property.
For the above of the present invention can be become apparent, preferred embodiment cited below particularly, and with reference to accompanying drawing, make detailed
It is described as follows.
Brief description of the drawings
The embodiment of the present invention is described in further detail below in conjunction with accompanying drawing.
Fig. 1 is the method flow diagram for the text structure method that first preferred embodiment of the invention provides;
Fig. 2 is the case history text schematic diagram that first preferred embodiment of the invention provides;
Fig. 3 is the word segmentation result schematic diagram for the case history text that first preferred embodiment of the invention provides;
Fig. 4 is the structured result schematic diagram for the case history text that first preferred embodiment of the invention provides;
Fig. 5 is the method flow diagram for the text structure method that second preferred embodiment of the invention provides;
Fig. 6 is the module connection diagram that the text structure makeup that third preferred embodiment of the invention provides is put;
Fig. 7 is the module connection diagram that the text structure makeup that four preferred embodiment of the invention provides is put.
Embodiment
Embodiments of the present invention are illustrated by particular specific embodiment below, those skilled in the art can be by this specification
Disclosed content understands other advantages and effect of the present invention easily.Although description of the invention will combine preferred embodiment
Introduce together, but this feature for not representing this invention is only limitted to the embodiment.On the contrary, invented with reference to embodiment
The purpose of introduction is to be possible to the other selections or transformation extended to cover the claim based on the present invention.In order to carry
For understanding the depth of the present invention, many concrete details will be included in describing below.The present invention can also be thin without using these
Section is implemented.In addition, in order to avoid the emphasis of the chaotic or fuzzy present invention, some details will be omitted in the de-scription.
Fig. 1 shows a kind of method flow diagram of text structure method according to first preferred embodiment of the invention, this article
This structural method includes step S10, step S20, step S30 and step S40.Specifically, in step slo, text structure
1 acquisition non-structured text is put in makeup, and non-structured text is pre-processed, and pretreated non-structured text is divided
Solution is into multiple subordinate sentences;In step S20, structuring entry and corresponding knot in 1 acquisition structured text are put in text structure makeup
The Q & A database of structure entry;In step s 30, the problem of text structure makeup puts 1 according in Q & A database is putd question to, will
The content of subordinate sentence is matched to corresponding structuring entry respectively, to obtain subordinate sentence structured result;In step s 40, text knot
Structure makeup puts 1 according to subordinate sentence structured result, obtains structured text.
Here, text structure makeup, which puts 1, includes but is not limited to user equipment, the network equipment, or user equipment is set with network
It is standby that formed equipment is integrated by network.User equipment includes but is not limited to the clients such as computer, smart mobile phone, PDA and set
It is standby.The network equipment includes but is not limited to computer, network host, single network server, multiple webserver collection or multiple
The cloud that server is formed, here, cloud is by a large amount of computer or the webserver structures based on cloud computing (Cloud Computing)
Into, wherein, cloud computing is one kind of Distributed Calculation, the virtual super meter of one be made up of the computer collection of a group loose couplings
Calculation machine.Network includes but is not limited to internet, wide area network, Metropolitan Area Network (MAN), LAN, VPN, wireless self-organization network (Ad
Hoc network) etc..
Certainly, those skilled in the art will be understood that above-mentioned text structure makeup is put 1 and is only for example, and other are existing or modern
The text structure makeup being likely to occur afterwards puts 1 and is such as applicable to the application, should also be included within the application protection domain, and
This is incorporated herein by reference.
Specifically, in step slo, 1 acquisition non-structured text is put in text structure makeup, and non-structured text is carried out
Pretreatment, and pretreated non-structured text is resolved into multiple subordinate sentences.
Retouching comprising information such as disease symptomses, medical history, state of an illness summaries in non-structured text, e.g. medical record
The property stated text, or the football news text or basketball newsletter archive of the information such as sportsman's goals, secondary attack number are included, certainly
It can also be other kinds of free text.
Here, the non-structured text of 1 acquisition user input is put in text structure makeup, the original disease of doctor's input is such as obtained
Text is gone through, and pretreatment operation is carried out to it, nonstandard non-structured text is converted into the non-structured text of specification,
So as to facilitate follow-up conversion operation.Preferably, pretreatment includes:(1) all characters in non-structured text are converted into
Double byte character, so as to simplify operation by carrying out the processing of similar ocra font ocr, lift conversion performance;(2) by unstructured text
Numeral and additional character in this are substituted for unified symbol, to simplify subsequent transformation processing, here, unified symbol is non-structural for this
Change text in be not in a kind of symbol, such as { Number }, { special }, during non-structured text is occurred 10,
1073rd, the additional character such as 1.763, -0.74 etc. numeral and # ,@is substituted for { Number } and { special };(3) will be unstructured
Non-visible character in text removes, and to simplify transition problem, further lifts conversion performance.Wherein, after having pre-processed, text
This structurizer 1 is then made pauses in reading unpunctuated ancient writings according to fullstop in non-structured text to pretreated non-structured text, will be non-
Structured text resolves into a series of subordinate sentences.For example, a certain pretreated non-structured text is " inferior lobe of right lung is visible to be dispersed in
Patch shape increase in density shadow.Remaining pulmonary parenchyma has no definite Density Anomalies shadow.Two hilus pulumonis are without increase, and tracheal bronchus is unobstructed, mediastinum
Have no enlarged lymph node.Both sides thoracic cavity has no hydrops, and pleura, which has no, to be thickened.", then accorded with according to the punctuate in the non-structured text
After number it being made pauses in reading unpunctuated ancient writings, a series of obtained subordinate sentences are " inferior lobe of right lung is visible to be dispersed in patch shape increase in density shadow ", " remaining lung is real
Matter has no definite Density Anomalies shadow ", " two hilus pulumonis are without increase, tracheal bronchus is unobstructed, mediastinum has no enlarged lymph node ", " both sides
Thoracic cavity has no that hydrops, pleura have no and thickened ".
In step S20, structuring entry and counter structure bar in 1 acquisition structured text are put in text structure makeup
Purpose Q & A database.
Here, 1 structured text generated as needed is put in text structure makeup, the structuring in structured text is obtained
Entry and Q & A database corresponding with structuring entry.Think that is, text structure makeup puts 1 firstly the need of user is obtained
The structured text form wanted, and therefrom each entry contents in drawing-out structure text formatting, then according to each entry
Setting problem is to formulate Q & A database;Or the structured text form that text structure makeup is put acquired in 1 has corresponded in itself
There are the question and answer data of correlation, then text structure makeup puts 1 and directly obtains each entry contents in the structured text, and directly
Follow-up structuring is carried out using its corresponding Q & A database to handle.
And preferably, in the preferred embodiment, after the structuring entry in obtaining structured text, first to structure
Change entry and carry out classification processing, structuring entry is divided into different type, such as numerical value, position place, obtains classification results,
Then set question template respectively for classification results, i.e., to each type of structuring entry formulation question template, and according to
Question template forms Q & A database corresponding with structuring entry.Here, Q & A database can enter according to actual a large amount of texts
Row training, not stop to add question template, so as to which all information ensured in non-structured text can be entirely covered.
In step s 30, the problem of text structure makeup puts 1 according in Q & A database is putd question to, by the content of subordinate sentence point
Structuring entry corresponding to not being matched to, to obtain subordinate sentence structured result.I.e. for each subordinate sentence, Q & A database is to it
Carry out problem enquirement, and in the structuring entry according to corresponding to puing question to result to be matched to its content respectively, it is multiple so as to obtain
Subordinate sentence structured result.
Specifically, step S30 includes step S31, step S32 and step S33.Wherein, in step S31, text structure
Makeup puts 1 pair of subordinate sentence and carries out word segmentation processing, and obtained subordinate sentence word segmentation result is inputted to the first LSTM networks, the first LSTM nets
Network carries out the first decoding process to subordinate sentence word segmentation result, to obtain subordinate sentence decoded result;In step s 32, text structure is disguised
The problem of putting 1 subordinate sentence corresponding based on Q & A database generation, and word segmentation processing is carried out to problem, by word segmentation result the problem of obtaining
Input to the 2nd LSTM networks, the 2nd LSTM networks carries out the second decoding process to problem word segmentation result, to obtain problem decoding
As a result;In step S33, text structure makeup puts 1 by the first LSTM networks and the 2nd LSTM networks according to subordinate sentence decoded result
Pairing, modeling are combined with problem decoded result, so as to obtain subordinate sentence structured result.
Here, using segmentation methods, such as Forward Maximum Method segmentation methods carry out word segmentation processing to non-structured text,
For example 1 acquisition unstructured case history text as shown in Figure 2 is put in text structure makeup, using segmentation methods to each subordinate sentence
Word segmentation processing is carried out, can obtain subordinate sentence word segmentation result as shown in Figure 3.Similarly, to formed based on Q & A database with
The corresponding asked questions of each subordinate sentence, word segmentation processing also is carried out to it using segmentation methods, to obtain problem word segmentation result.With
Afterwards, subordinate sentence word segmentation result is inputted to the first LSTM (shot and long term memory artificial neural network) network, and by the participle knot of problem
Fruit is inputted to the 2nd LSTM networks, wherein, the first LSTM networks carry out decoding process, correspondingly, second to subordinate sentence word segmentation result
LSTM networks then carry out decoding process to problem word segmentation result, and text structure makeup is put 1 based on subordinate sentence decoded result and asked
Two LSTM networks of first LSTM networks and the 2nd LSTM networks are combined pairing, modeling by topic decoded result, so as to obtain
The structured result of each subordinate sentence.
Wherein, for each subordinate sentence, it is necessary to be generated according to the structured result of the current subordinate sentence and Q & A database new
The problem of, and to it is newly-generated the problem of carry out further word segmentation processing, retrieve new subordinate sentence structured result, again
The foundation that the subordinate sentence structured result arrived is then formed as problem next time again, subordinate sentence structuring processing once is carried out, successively
Analogize, until problems associated enquirement finishes in Q & A database.That is, needed for each subordinate sentence according to it
Current structure result, decide whether to continue the problem of questioning closely, if the initial configuration result of a certain subordinate sentence is " XX be present
Focus", then to continue to question closely:XX focuses positioned at whereHow is the form of XX focusesSpecifically, such as a certain subordinate sentence is first
Beginning structured result is " the whether visible shadow of inferior lobe of right lung, stove", then to continue to question closely " the visible shadow of inferior lobe of right lung, stove form”
Deng until problems associated enquirement finishes in Q & A database, to obtain structured result as shown in Figure 5.
In step s 40, text structure makeup puts 1 according to subordinate sentence structured result, obtains structured text.Preferably,
Text structure makeup puts 1 and first merges multiple subordinate sentence structured results, paragraph structure result is obtained, then to paragraph
Structured result is post-processed, and obtains structured text.Refer to there be asking for answer by all in each subordinate sentence here, merging
Topic is merged to obtain final result, and so as to obtain paragraph structure result, and post-processing includes:(1) by structured text
In associated description standardization, such as by tonsillotome size description in I °, once, I degree specification into 1 degree, by bubble classification
In " bubbling rales " specification be " coarse moist rale ", " medium bubbling rales " specification is " medium rales ", and " fine bubbling rale " specification is " thin
Bubble ";(2) numeral that unified symbol is replaced by pretreatment and additional character are subjected to reduction treatment, such as by foregoing reality
Apply and 10,1073,1.763 ,-the 0.74 of unified symbol is replaced by example is reduced into 10,1073,1.763 ,-the 0.74 of script, from
And holding structure content of text is consistent with original non-structured text content.
As the variation of above-described embodiment, as shown in Figure 5, second preferred embodiment of the invention provides a kind of text
Structural method, the method comprising the steps of S10 ', step S20 ', step S30 ', step S40 ', step S50 and step S60.
Specifically, in step S10 ', 1 acquisition non-structured text is put in text structure makeup, and non-structured text is entered
Row pretreatment, and pretreated non-structured text is resolved into multiple subordinate sentences;In step S20 ', text structure makeup is put
1 obtains the Q & A database of the structuring entry and counter structure entry in structured text;In step S30 ', text knot
The problem of structure makeup puts 1 according in Q & A database is putd question to, and the content of subordinate sentence is matched into corresponding structuring entry respectively, with
Obtain subordinate sentence structured result;In step S40 ', text structure makeup puts 1 according to subordinate sentence structured result, obtains structuring
Text;In step s 50, text structure makeup, which puts 1 structured text is changed into vector, is stored in result database, and will knot
The vector of structure text carries out similarity system design with other vectors stored in result database, to obtain the phase of structured text
Like property text;In step S60, the similarity between 1 calculating structured text and similitude text is put in text structure makeup.Its
In, step S10, step S20, step S30, step described in step S10 ', step S20 ', step S30 ', step S40 ' and Fig. 1
S40 is identical or essentially identical, therefore here is omitted, and is incorporated herein by reference.
In step s 50, text structure makeup, which puts 1 structured text is changed into vector, is stored in result database, and
The vector of structured text is subjected to similarity system design with other vectors stored in result database, to obtain structured text
Similitude text.
Here, compare the Euclidean distance between other text vectors stored in structured text vector and database,
And similarity system design is carried out according to the distance of Euclidean distance, to find out the Similar Text of the structured text from database.
Such as, in medical record, can with it, retrieve case history text similar to the medical record in medical record storehouse,
So as to facilitate Clinics and Practices of the doctor for the disease.
Further, in step S60, text structure makeup is put between 1 calculating structured text and similitude text
Similarity.That is, put 1 for the similitude text retrieved from database, text structure makeup and calculate itself and the knot respectively
Similarity between structure text, and the similarity is exported to user, to facilitate user to carry out the comparison between text, sentence
It is disconnected.
Wherein, in medical record text, similarity system design and Similarity Measure can efficiently generate similar case history, and push away
Recommend the similarity of similar case history, very big booster action can be played to the work of doctor, with preferably carry out the diagnosis of disease with
Treatment.
Fig. 6 shows the schematic device put according to a kind of makeup of text structure of third preferred embodiment of the invention, this article
This structurizer 1 includes pretreatment module 100, entry acquisition module 200, subordinate sentence structurized module 300 and text and forms mould
Block 400.Specifically, pretreatment module 100, for obtaining non-structured text, non-structured text is pre-processed, and will
Pretreated non-structured text resolves into multiple subordinate sentences;Entry acquisition module 200, for obtaining the knot in structured text
The Q & A database of structure entry and counter structure entry;Subordinate sentence structurized module 300, for according in Q & A database
Problem is putd question to, and the content of subordinate sentence is matched into corresponding structuring entry respectively, to obtain subordinate sentence structured result;Text is formed
Module 400, for according to subordinate sentence structured result, obtaining structured text.
Here, text structure makeup, which puts 1, includes but is not limited to user equipment, the network equipment, or user equipment is set with network
It is standby that formed equipment is integrated by network.User equipment includes but is not limited to the clients such as computer, smart mobile phone, PDA and set
It is standby.The network equipment includes but is not limited to computer, network host, single network server, multiple webserver collection or multiple
The cloud that server is formed, here, cloud is by a large amount of computer or the webserver structures based on cloud computing (Cloud Computing)
Into, wherein, cloud computing is one kind of Distributed Calculation, the virtual super meter of one be made up of the computer collection of a group loose couplings
Calculation machine.Network includes but is not limited to internet, wide area network, Metropolitan Area Network (MAN), LAN, VPN, wireless self-organization network (Ad
Hoc network) etc..
Certainly, those skilled in the art will be understood that above-mentioned text structure makeup is put 1 and is only for example, and other are existing or modern
The text structure makeup being likely to occur afterwards puts 1 and is such as applicable to the application, should also be included within the application protection domain, and
This is incorporated herein by reference.
Specifically, pretreatment module 100, for obtaining non-structured text, non-structured text is pre-processed, and
Pretreated non-structured text is resolved into multiple subordinate sentences.
Retouching comprising information such as disease symptomses, medical history, state of an illness summaries in non-structured text, e.g. medical record
The property stated text, or the football news text or basketball newsletter archive of the information such as sportsman's goals, secondary attack number are included, certainly
It can also be other kinds of free text.
Here, pretreatment module 100 obtains the non-structured text of user's input, the original case history of doctor's input is such as obtained
Text, and pretreatment operation is carried out to it, nonstandard non-structured text is converted into the non-structured text of specification, from
And facilitate follow-up conversion operation.Preferably, pretreatment includes:(1) all characters conversion in non-structured text is helped
Angle character, so as to simplify operation by carrying out the processing of similar ocra font ocr, lift conversion performance;(2) by non-structured text
In numeral and additional character be substituted for unified symbol, to simplify subsequent transformation processing, here, unified symbol is unstructured for this
A kind of symbol for being not in text, such as { Number }, { special }, during non-structured text is occurred 10,
1073rd, the additional character such as 1.763, -0.74 etc. numeral and # ,@is substituted for { Number } and { special };(3) will be unstructured
Non-visible character in text removes, and to simplify transition problem, further lifts conversion performance.Wherein, after having pre-processed, in advance
Processing module 100 is made pauses in reading unpunctuated ancient writings pretreated non-structured text according to fullstop in non-structured text, will be non-structural
Change text and resolve into a series of subordinate sentences.
For example, a certain pretreated non-structured text is " inferior lobe of right lung is visible to be dispersed in patch shape increase in density shadow.It is remaining
Pulmonary parenchyma has no definite Density Anomalies shadow.Two hilus pulumonis are without increase, and tracheal bronchus is unobstructed, and mediastinum has no enlarged lymph node.Both sides
Thoracic cavity has no hydrops, and pleura, which has no, to be thickened.", then after making pauses in reading unpunctuated ancient writings according to the punctuation mark in the non-structural text to it, obtain
A series of subordinate sentences for " inferior lobe of right lung is visible to be dispersed in patch shape increase in density shadow ", " remaining pulmonary parenchyma has no definite Density Anomalies
Shadow ", " two hilus pulumonis are without increase, tracheal bronchus is unobstructed, mediastinum has no enlarged lymph node ", " both sides thoracic cavity has no hydrops, pleura not
See and thicken ".
Entry acquisition module 200, for obtaining asking for the structuring entry in structured text and counter structure entry
Answer database.
Here, the structured text that entry acquisition module 200 generates as needed, obtains the structuring in structured text
Entry and Q & A database corresponding with structuring entry, that is to say, that entry acquisition module 200 is thought firstly the need of user is obtained
The structured text form wanted, and therefrom each entry contents in drawing-out structure text formatting, then according to each entry
Setting problem is to formulate Q & A database;Or the structured text form acquired in entry acquisition module 200 has corresponded in itself
There are the question and answer data of correlation, then entry acquisition module 200 directly obtains each entry contents in the structured text, and directly
Follow-up structuring is carried out using its corresponding Q & A database to handle.
And preferably, in this preferred embodiment, entry acquisition module 200 includes taxon 201 and forms unit 202,
Taxon 201, after the structuring entry in structured text is obtained, structuring entry is carried out at classification first
Reason, is divided into different type, such as numerical value, position place by structuring entry, obtains classification results, then, forms unit 202
Question template is set respectively for classification results, i.e., question template is formulated to each type of structuring entry, and according to problem
Template forms Q & A database corresponding with structuring entry.Here, Q & A database can be instructed according to actual a large amount of texts
Practice, not stop to add question template, so as to which all information ensured in non-structured text can be entirely covered.
Subordinate sentence structurized module 300, put question to the problem of for according in Q & A database, the content of subordinate sentence is matched respectively
To corresponding structuring entry, to obtain subordinate sentence structured result.Problem is carried out to it to each subordinate sentence, Q & A database
Put question to, and in the structuring entry according to corresponding to puing question to result to be matched to its content respectively, so as to obtain multiple subordinate sentence structures
Change result.
Specifically, subordinate sentence structurized module 300 includes subordinate sentence processing unit 301, issue handling unit 302 and combination pairing
Unit 303.Wherein, subordinate sentence processing unit 301, for carrying out word segmentation processing to subordinate sentence, and obtained subordinate sentence word segmentation result is defeated
Enter to the first LSTM networks, the first LSTM networks and the first decoding process is carried out to subordinate sentence word segmentation result, to obtain subordinate sentence decoding knot
Fruit;Issue handling unit 302, for generating the problem of corresponding to subordinate sentence based on Q & A database, and word segmentation processing is carried out to problem,
The problem of obtaining, word segmentation result was inputted to the 2nd LSTM networks, and the 2nd LSTM networks carry out second to problem word segmentation result and decoded
Processing, to obtain problem decoded result;Pairing unit 303 is combined, for by the first LSTM networks and the 2nd LSTM network foundations
Subordinate sentence decoded result and problem decoded result are combined pairing, modeling, so as to obtain subordinate sentence structured result.
Here, using segmentation methods, such as Forward Maximum Method segmentation methods carry out word segmentation processing to non-structured text,
For example subordinate sentence processing unit 301 obtains unstructured case history text as shown in Figure 2, using segmentation methods to each subordinate sentence
Word segmentation processing is carried out, can obtain subordinate sentence word segmentation result as shown in Figure 3.Similarly, to formed based on Q & A database with
The corresponding asked questions of each subordinate sentence, word segmentation processing also is carried out to it using segmentation methods, to obtain problem word segmentation result, with
Afterwards, subordinate sentence processing unit 301 inputs subordinate sentence word segmentation result to the first LSTM (shot and long term memory artificial neural network) network, asks
Topic processing unit 302 inputs the word segmentation result of problem to the 2nd LSTM networks, wherein, combine first in pairing unit 303
LSTM networks carry out decoding process to subordinate sentence word segmentation result, and the 2nd LSTM networks are then carried out at decoding to problem word segmentation result
Reason, and combine pairing unit 303 and be based on subordinate sentence decoded result and problem decoded result by the first LSTM networks and the 2nd LSTM nets
Two LSTM networks of network are combined pairing, modeling, so as to obtain the structured result of each subordinate sentence.
Wherein, for each subordinate sentence, it is necessary to be generated according to the structured result of the current subordinate sentence and Q & A database new
The problem of, and to it is newly-generated the problem of carry out further word segmentation processing, retrieve new subordinate sentence structured result, again
The foundation that the subordinate sentence structured result arrived is then formed as problem next time again, subordinate sentence structuring processing once is carried out, successively
Analogize, until problems associated enquirement finishes in Q & A database.That is, needed for each subordinate sentence according to it
Current structure result, decide whether to continue the problem of questioning closely, if the initial configuration result of a certain subordinate sentence is " XX be present
Focus", then to continue to question closely:XX focuses positioned at whereHow is the form of XX focusesSpecifically, such as a certain subordinate sentence is first
Beginning structured result is " the whether visible shadow of inferior lobe of right lung, stove", then to continue to question closely " the visible shadow of inferior lobe of right lung, stove form”
Deng until problems associated enquirement finishes in Q & A database, to obtain structured result as shown in Figure 5.
Text forms module 400, for according to subordinate sentence structured result, obtaining structured text.Preferably, text is formed
Module 400 includes combining unit 401 and post-processing unit 402, wherein, combining unit 401 first is by multiple subordinate sentence structuring knots
Fruit merges, and obtains paragraph structure result, and then, post-processing unit 402 post-processes to paragraph structure result, obtains
To structured text.Refer to there is the problem of answer to merge finally to be answered by all in each subordinate sentence here, merging
Case, so as to obtain paragraph structure result, and post-processing includes:(1) associated description in structured text is standardized, such as will
I ° in the description of tonsillotome size, once, the specification such as I degree into 1 degree, by " bubbling rales " specification in bubble classification for " slightly
Bubble ", " medium bubbling rales " specification are " medium rales ", and " fine bubbling rale " specification is " fine moist rale ";(2) quilt in pre-processing
The numeral and additional character for replacing with unified symbol carry out reduction treatment, as will be replaced by unified symbol in previous embodiment
10th, 1073,1.763, -0.74 10,1073,1.763,-the 0.74 of script is reduced into, so as to holding structure content of text and original
The non-structured text content of beginning is consistent.
As the variation of above-described embodiment, as shown in Figure 7, four preferred embodiment of the invention provides a kind of text
Structurizer, the device also include similitude judge module 500 and similarity calculation module 600.
Specifically, similitude judge module 500, result database is stored in for structured text to be changed into vector,
And the vector of structured text is subjected to similarity system design with other vectors stored in result database, to obtain structuring text
This similitude text.
Here, compare the Euclidean distance between other text vectors stored in structured text vector and database,
And similarity system design is carried out according to the distance of Euclidean distance, to find out the Similar Text of the structured text from database.
Such as, in medical record, case history text similar to the medical record in medical record storehouse can be retrieved by the device,
So as to facilitate Clinics and Practices of the doctor for the disease.
Further, similarity calculation module 600, it is similar between structured text and similitude text for calculating
Degree.That is, for the similitude text retrieved from database, text structure makeup puts 1 and calculates itself and structuring text respectively
Similarity between this, and the similarity is exported to user, to facilitate user to carry out the comparison between text, judge.
Wherein, in medical diagnosis, similarity system design and Similarity Measure can efficiently generate similar case history, and recommend phase
Like the similarity of case history, very big booster action can be played to the work of doctor, preferably to carry out the Clinics and Practices of disease.
As the variation of above-mentioned embodiment, present invention also offers a kind of text structure system, including foregoing reality
The makeup of the text structure in mode is applied to put.
It is non-easily at this present invention also offers a kind of non-volatile memory medium as the variation of above-mentioned embodiment
Text structure program is stored with the property lost storage medium, text structure program is computer-executed to implement text structure
Method, including:
A is instructed, non-structured text is obtained, non-structured text is pre-processed, and will be pretreated non-structural
Change text and resolve into multiple subordinate sentences;
B is instructed, obtains the Q & A database of the structuring entry and counter structure entry in structured text;
Instruct c, put question to the problem of according in Q & A database, by the content of subordinate sentence be matched to respectively corresponding to structuring bar
Mesh, to obtain subordinate sentence structured result;
D is instructed, according to subordinate sentence structured result, obtains structured text.
As above, text structure method, apparatus, system and non-volatile memory medium disclosed by the invention, with reference to question and answer
Database, non-structured text message can be fully converted to structured message, changing effect is good, and accuracy rate is high, and passes through
Two LSTM networks carry out subordinate sentence structuring processing, can handle expression way various in free text, have good stalwartness
Property.
It is in summary, provided by the invention that the above-described embodiments merely illustrate the principles and effects of the present invention, rather than
For limiting the present invention.Any person skilled in the art all can be under the spirit and scope without prejudice to the present invention, to above-mentioned reality
Apply example and carry out modifications and changes.Therefore, such as those of ordinary skill in the art without departing from disclosed
Spirit and all equivalent modifications for being completed under technological thought or change, should be covered by the claim of the present invention.
Claims (14)
- A kind of 1. text structure method, it is characterised in that this method includes:Non-structured text is obtained, the non-structured text is pre-processed, and will be pretreated described unstructured Text resolves into multiple subordinate sentences;Obtain the Q & A database of the structuring entry and the corresponding structuring entry in structured text;The problem of according in the Q & A database, is putd question to, and the content of the subordinate sentence is matched into the corresponding structuring respectively Entry, to obtain subordinate sentence structured result;According to the subordinate sentence structured result, the structured text is obtained.
- 2. text structure method according to claim 1, it is characterised in that the pretreatment includes:By the non-knot Numeral and additional character in structure text are substituted for unified symbol.
- 3. text structure method according to claim 1, it is characterised in that obtain the structure in the structured text Change the Q & A database of the entry with the corresponding structuring entry, including:Classification processing is carried out to the structuring entry, obtains classification results;Question template is set respectively for the classification results, and the corresponding structuring entry is formed according to described problem template The Q & A database.
- 4. text structure method according to claim 1, it is characterised in that the problem of according in the Q & A database Put question to, the content of the subordinate sentence is matched to the corresponding structuring entry respectively, to obtain the subordinate sentence structured result, Including:Word segmentation processing is carried out to the subordinate sentence, and obtained subordinate sentence word segmentation result is inputted to the first LSTM networks, described first LSTM networks carry out the first decoding process to the subordinate sentence word segmentation result, to obtain subordinate sentence decoded result;The problem of corresponding subordinate sentence is generated based on the Q & A database, and word segmentation processing is carried out to described problem, it will obtain The problem of word segmentation result input to the 2nd LSTM networks, the 2nd LSTM networks carry out the second solution to described problem word segmentation result Code processing, to obtain problem decoded result;The first LSTM networks and the 2nd LSTM networks are tied according to the subordinate sentence decoded result and described problem decoding Fruit is combined pairing, modeling, so as to obtain the subordinate sentence structured result.
- 5. text structure method according to claim 1, it is characterised in that according to the subordinate sentence structured result, obtain To the structured text, including:Merge multiple subordinate sentence structured results, obtain paragraph structure result;The paragraph structure result is post-processed, obtains the structured text.
- 6. text structure method according to claim 1, it is characterised in that according to the subordinate sentence structured result, obtain To after the structured text, in addition to:The structured text is changed into vector and is stored in result database, and by the vector of the structured text with Other vectors stored in the result database carry out similarity system design, to obtain the similitude of structured text text This;Calculate the similarity between the structured text and the similitude text.
- 7. a kind of makeup of text structure is put, it is characterised in that the text structure makeup put including:Pretreatment module, for obtaining non-structured text, the non-structured text is pre-processed, and by after pretreatment The non-structured text resolve into multiple subordinate sentences;Entry acquisition module, for obtaining question and answer number of the structuring entry in structured text with the corresponding structuring entry According to storehouse;Subordinate sentence structurized module, put question to the problem of for according in the Q & A database, by the content of the subordinate sentence respectively The structuring entry corresponding to being assigned to, to obtain subordinate sentence structured result;Text forms module, for according to the subordinate sentence structured result, obtaining the structured text.
- 8. text structure makeup according to claim 7 is put, it is characterised in that the pretreatment module is additionally operable to will be described Numeral and additional character in non-structured text are substituted for unified symbol.
- 9. text structure makeup according to claim 7 is put, it is characterised in that the entry acquisition module includes:Taxon, for carrying out classification processing to the structuring entry, obtain classification results;Unit is formed, sets question template for the classification results, and the corresponding structure is formed according to described problem template Change the Q & A database of entry.
- 10. text structure makeup according to claim 7 is put, it is characterised in that the subordinate sentence structurized module includes:Subordinate sentence processing unit, for carrying out word segmentation processing to the subordinate sentence, and obtained subordinate sentence word segmentation result is inputted to first LSTM networks, the first LSTM networks carry out the first decoding process to the subordinate sentence word segmentation result, to obtain subordinate sentence decoding knot Fruit;Issue handling unit, for generating the problem of corresponding to the subordinate sentence based on the Q & A database, and described problem is entered Row word segmentation processing, word segmentation result the problem of obtaining is inputted to the 2nd LSTM networks, the 2nd LSTM networks are to described problem Word segmentation result carries out the second decoding process, to obtain problem decoded result;Combine pairing unit, for will the first LSTM networks and the 2nd LSTM networks according to the subordinate sentence decoded result Pairing, modeling are combined with described problem decoded result, obtains the subordinate sentence structured result.
- 11. text structure makeup according to claim 7 is put, it is characterised in that the text, which forms module, to be included:Combining unit, for merging multiple subordinate sentence structured results, obtain paragraph structure result;Post-processing unit, for being post-processed to the paragraph structure result, obtain the structured text.
- 12. text structure makeup according to claim 7 is put, it is characterised in that the text structure makeup, which is put, also to be included:Similitude judge module, result database is stored in for the structured text to be changed into vector, and by the knot Described vectorial other vector progress similarity system designs with being stored in the result database of structure text, to obtain the knot The similitude text of structure text;Similarity calculation module, for calculating the similarity between the structured text and the similitude text.
- 13. a kind of text structure system, it is characterised in that including the text structure as described in any in claim 7-12 Device.
- A kind of 14. non-volatile memory medium, it is characterised in that text structure program is stored with said storage, The text structure program is computer-executed to implement text structure method, including:A is instructed, non-structured text is obtained, the non-structured text is pre-processed, and will be pretreated described non- Structured text resolves into multiple subordinate sentences;B is instructed, obtains the Q & A database of the structuring entry and the corresponding structuring entry in structured text;Instruct c, put question to the problem of according in the Q & A database, by the content of the subordinate sentence be matched to respectively corresponding to it is described Structuring entry, to obtain subordinate sentence structured result;D is instructed, according to the subordinate sentence structured result, obtains the structured text.
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010477184.6A CN111680089B (en) | 2017-09-19 | 2017-09-19 | Text structuring method, device and system and non-volatile storage medium |
CN202010477190.1A CN111680090B (en) | 2017-09-19 | 2017-09-19 | Text structuring method, device and system and non-volatile storage medium |
CN202010511844.8A CN111680094B (en) | 2017-09-19 | 2017-09-19 | Text structuring method, device and system and non-volatile storage medium |
CN201710852183.3A CN107729392B (en) | 2017-09-19 | 2017-09-19 | Text structuring method, device and system and non-volatile storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710852183.3A CN107729392B (en) | 2017-09-19 | 2017-09-19 | Text structuring method, device and system and non-volatile storage medium |
Related Child Applications (3)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010511844.8A Division CN111680094B (en) | 2017-09-19 | 2017-09-19 | Text structuring method, device and system and non-volatile storage medium |
CN202010477184.6A Division CN111680089B (en) | 2017-09-19 | 2017-09-19 | Text structuring method, device and system and non-volatile storage medium |
CN202010477190.1A Division CN111680090B (en) | 2017-09-19 | 2017-09-19 | Text structuring method, device and system and non-volatile storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107729392A true CN107729392A (en) | 2018-02-23 |
CN107729392B CN107729392B (en) | 2020-07-10 |
Family
ID=61206611
Family Applications (4)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010477190.1A Active CN111680090B (en) | 2017-09-19 | 2017-09-19 | Text structuring method, device and system and non-volatile storage medium |
CN202010477184.6A Active CN111680089B (en) | 2017-09-19 | 2017-09-19 | Text structuring method, device and system and non-volatile storage medium |
CN202010511844.8A Active CN111680094B (en) | 2017-09-19 | 2017-09-19 | Text structuring method, device and system and non-volatile storage medium |
CN201710852183.3A Active CN107729392B (en) | 2017-09-19 | 2017-09-19 | Text structuring method, device and system and non-volatile storage medium |
Family Applications Before (3)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010477190.1A Active CN111680090B (en) | 2017-09-19 | 2017-09-19 | Text structuring method, device and system and non-volatile storage medium |
CN202010477184.6A Active CN111680089B (en) | 2017-09-19 | 2017-09-19 | Text structuring method, device and system and non-volatile storage medium |
CN202010511844.8A Active CN111680094B (en) | 2017-09-19 | 2017-09-19 | Text structuring method, device and system and non-volatile storage medium |
Country Status (1)
Country | Link |
---|---|
CN (4) | CN111680090B (en) |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108629019A (en) * | 2018-05-08 | 2018-10-09 | 桂林电子科技大学 | A kind of Question sentence parsing computational methods containing name towards question and answer field |
CN108711443A (en) * | 2018-05-07 | 2018-10-26 | 成都智信电子技术有限公司 | The text data analysis method and device of electronic health record |
CN108733837A (en) * | 2018-05-28 | 2018-11-02 | 杭州依图医疗技术有限公司 | A kind of the natural language structural method and device of case history text |
CN109145299A (en) * | 2018-08-16 | 2019-01-04 | 北京金山安全软件有限公司 | Text similarity determination method, device, equipment and storage medium |
CN109409645A (en) * | 2018-09-07 | 2019-03-01 | 平安科技(深圳)有限公司 | The method and storage medium that electronic device, lawyer recommend |
CN109493926A (en) * | 2018-10-30 | 2019-03-19 | 中山大学肿瘤防治中心 | Processing method, device, medium and the electronic equipment of colorectal cancer medical data |
CN109800284A (en) * | 2018-12-19 | 2019-05-24 | 中国电子科技集团公司第二十八研究所 | A kind of unstructured information intelligent Answer System construction method of oriented mission |
CN110321466A (en) * | 2019-06-14 | 2019-10-11 | 广发证券股份有限公司 | A kind of security information duplicate checking method and system based on semantic analysis |
CN110415791A (en) * | 2019-01-29 | 2019-11-05 | 四川大学华西医院 | System and method is established in a kind of disease library |
CN110472925A (en) * | 2018-05-11 | 2019-11-19 | 懿谷智能科技(上海)有限公司 | A kind of laboratory test process management system and method based on webpage flow chart |
CN111125100A (en) * | 2019-12-12 | 2020-05-08 | 东软集团股份有限公司 | Data storage method and device, storage medium and electronic equipment |
CN112364035A (en) * | 2021-01-14 | 2021-02-12 | 零犀(北京)科技有限公司 | Processing method and device for call record big data, electronic equipment and storage medium |
WO2021068321A1 (en) * | 2019-10-12 | 2021-04-15 | 平安科技(深圳)有限公司 | Information pushing method and apparatus based on human-computer interaction, and computer device |
CN112765194A (en) * | 2020-12-31 | 2021-05-07 | 科大讯飞股份有限公司 | Data retrieval method and electronic equipment |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112800759B (en) * | 2021-04-14 | 2021-08-06 | 北京金山云网络技术有限公司 | Standardized data generation method and device and medical text data processing method and device |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1497473A (en) * | 2002-09-30 | 2004-05-19 | Metod and device for text structurng | |
CN104899260A (en) * | 2015-05-20 | 2015-09-09 | 东华大学 | Method for structured processing of Chinese pathological text |
US20160180215A1 (en) * | 2014-12-19 | 2016-06-23 | Google Inc. | Generating parse trees of text segments using neural networks |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106095913A (en) * | 2016-06-08 | 2016-11-09 | 广州同构医疗科技有限公司 | A kind of electronic health record text structure method |
CN106649561B (en) * | 2016-11-10 | 2020-05-26 | 复旦大学 | Intelligent question-answering system for tax consultation service |
CN106897568A (en) * | 2017-02-28 | 2017-06-27 | 北京大数医达科技有限公司 | The treating method and apparatus of case history structuring |
-
2017
- 2017-09-19 CN CN202010477190.1A patent/CN111680090B/en active Active
- 2017-09-19 CN CN202010477184.6A patent/CN111680089B/en active Active
- 2017-09-19 CN CN202010511844.8A patent/CN111680094B/en active Active
- 2017-09-19 CN CN201710852183.3A patent/CN107729392B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1497473A (en) * | 2002-09-30 | 2004-05-19 | Metod and device for text structurng | |
US20160180215A1 (en) * | 2014-12-19 | 2016-06-23 | Google Inc. | Generating parse trees of text segments using neural networks |
CN104899260A (en) * | 2015-05-20 | 2015-09-09 | 东华大学 | Method for structured processing of Chinese pathological text |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108711443A (en) * | 2018-05-07 | 2018-10-26 | 成都智信电子技术有限公司 | The text data analysis method and device of electronic health record |
CN108629019A (en) * | 2018-05-08 | 2018-10-09 | 桂林电子科技大学 | A kind of Question sentence parsing computational methods containing name towards question and answer field |
CN110472925A (en) * | 2018-05-11 | 2019-11-19 | 懿谷智能科技(上海)有限公司 | A kind of laboratory test process management system and method based on webpage flow chart |
CN108733837A (en) * | 2018-05-28 | 2018-11-02 | 杭州依图医疗技术有限公司 | A kind of the natural language structural method and device of case history text |
CN108733837B (en) * | 2018-05-28 | 2021-04-27 | 上海依智医疗技术有限公司 | Natural language structuring method and device for medical history text |
CN109145299A (en) * | 2018-08-16 | 2019-01-04 | 北京金山安全软件有限公司 | Text similarity determination method, device, equipment and storage medium |
CN109145299B (en) * | 2018-08-16 | 2022-06-21 | 北京金山安全软件有限公司 | Text similarity determination method, device, equipment and storage medium |
CN109409645A (en) * | 2018-09-07 | 2019-03-01 | 平安科技(深圳)有限公司 | The method and storage medium that electronic device, lawyer recommend |
CN109493926A (en) * | 2018-10-30 | 2019-03-19 | 中山大学肿瘤防治中心 | Processing method, device, medium and the electronic equipment of colorectal cancer medical data |
CN109800284B (en) * | 2018-12-19 | 2021-02-05 | 中国电子科技集团公司第二十八研究所 | Task-oriented unstructured information intelligent question-answering system construction method |
CN109800284A (en) * | 2018-12-19 | 2019-05-24 | 中国电子科技集团公司第二十八研究所 | A kind of unstructured information intelligent Answer System construction method of oriented mission |
CN110415791A (en) * | 2019-01-29 | 2019-11-05 | 四川大学华西医院 | System and method is established in a kind of disease library |
CN110321466A (en) * | 2019-06-14 | 2019-10-11 | 广发证券股份有限公司 | A kind of security information duplicate checking method and system based on semantic analysis |
CN110321466B (en) * | 2019-06-14 | 2023-09-15 | 广发证券股份有限公司 | Securities information duplicate checking method and system based on semantic analysis |
WO2021068321A1 (en) * | 2019-10-12 | 2021-04-15 | 平安科技(深圳)有限公司 | Information pushing method and apparatus based on human-computer interaction, and computer device |
CN111125100A (en) * | 2019-12-12 | 2020-05-08 | 东软集团股份有限公司 | Data storage method and device, storage medium and electronic equipment |
CN112765194A (en) * | 2020-12-31 | 2021-05-07 | 科大讯飞股份有限公司 | Data retrieval method and electronic equipment |
CN112765194B (en) * | 2020-12-31 | 2024-04-30 | 科大讯飞股份有限公司 | Data retrieval method and electronic equipment |
CN112364035A (en) * | 2021-01-14 | 2021-02-12 | 零犀(北京)科技有限公司 | Processing method and device for call record big data, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN111680090B (en) | 2023-03-21 |
CN111680089A (en) | 2020-09-18 |
CN107729392B (en) | 2020-07-10 |
CN111680094B (en) | 2023-03-21 |
CN111680089B (en) | 2023-03-21 |
CN111680094A (en) | 2020-09-18 |
CN111680090A (en) | 2020-09-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107729392A (en) | Text structure method, apparatus, system and non-volatile memory medium | |
CN106980683B (en) | Blog text abstract generating method based on deep learning | |
WO2020062770A1 (en) | Method and apparatus for constructing domain dictionary, and device and storage medium | |
Sun et al. | Sentiment analysis for Chinese microblog based on deep neural networks with convolutional extension features | |
CN106776711A (en) | A kind of Chinese medical knowledge mapping construction method based on deep learning | |
WO2019080863A1 (en) | Text sentiment classification method, storage medium and computer | |
CN106844658A (en) | A kind of Chinese text knowledge mapping method for auto constructing and system | |
CN105528437A (en) | Question-answering system construction method based on structured text knowledge extraction | |
CN106909572A (en) | A kind of construction method and device of question and answer knowledge base | |
JP6856709B2 (en) | Training data generation methods, training data generators, electronics and computer readable storage media | |
CN103049490B (en) | Between knowledge network node, attribute generates system and the method for generation | |
CN107844608A (en) | A kind of sentence similarity comparative approach based on term vector | |
Liang et al. | ISIA at the ImageCLEF 2017 Image Caption Task. | |
CN114647713A (en) | Knowledge graph question-answering method, device and storage medium based on virtual confrontation | |
Ye et al. | Multi-level composite neural networks for medical question answer matching | |
Li et al. | Drug specification named entity recognition base on BILSTM-CRF model | |
Kang et al. | A short texts matching method using shallow features and deep features | |
CN116541520A (en) | Emotion analysis method and device, electronic equipment and storage medium | |
M’sik et al. | Topic modeling coherence: A comparative study between lda and nmf models using covid’19 corpus | |
Putra et al. | Sentence boundary disambiguation for Indonesian language | |
Chaonithi et al. | A hybrid approach for Thai word segmentation with crowdsourcing feedback system | |
Mutiah et al. | Topic modeling on covid-19 vaccination in indonesia using lda model | |
CN109670186A (en) | Production method of abstracting and device based on machine learning | |
He et al. | Modeling coherence and diversity for image paragraph captioning | |
CN111814433B (en) | Uygur language entity identification method and device and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |