CN110110334A

CN110110334A - A kind of remote medical consultation with specialists recording text error correction method based on natural language processing

Info

Publication number: CN110110334A
Application number: CN201910379327.7A
Authority: CN
Inventors: 赵杰; 翟运开; 石金铭; 崔莉亚; 陈昊天; 李明原; 宋晓琴; 王振博
Original assignee: Zhengzhou University
Current assignee: Zhengzhou University
Priority date: 2019-05-08
Filing date: 2019-05-08
Publication date: 2019-08-09
Anticipated expiration: 2039-05-08
Also published as: CN110110334B

Abstract

The invention discloses a kind of remote medical consultation with specialists recording text error correction method based on natural language processing, belong to big data technical field, including deployment central server and several clients, preprocessing module is established in central server, database, debugging module and correction module, solves the technical issues of text automatic errordetecting automatic error-correcting, the present invention dissipates string progress debugging using CRF and n-gram and carries out error correction according to specific error reason, very high level is reached in the ability of error correction, and it increases in accuracy than the prior art, the present invention can mitigate the pressure of work to the staff of remote diagnosis section, improve the efficiency of work.

Description

A kind of remote medical consultation with specialists recording text error correction method based on natural language processing

Technical field

The invention belongs to big data technical fields more particularly to a kind of remote medical consultation with specialists based on natural language processing to record text This error correction method.

Background technique

With the development of computer technology and network technology, Telemedicine Consultation has become the one of modern medical service system system A important component part.There is long-range section staff typing as consultation of doctors result is single, will appear multiword in Input Process, lacks Word, misspelling, it is therefore desirable to there is special manpower or system to check and proofread these texts.At present to tele-medicine The proof-reading of the consultation of doctors opinion list of record during the consultation of doctors is not only time-consuming but also laborious still based on artificial, so right Telemedicine Consultation process carries out automatic Proofreading at the text information of middle formation to have great importance in telemedicine field.

Summary of the invention

The object of the present invention is to provide a kind of the remote medical consultation with specialists recording text error correction method based on natural language processing, solution The technical issues of text automatic errordetecting automatic error-correcting.

To achieve the above object, the present invention adopts the following technical scheme:

A kind of remote medical consultation with specialists recording text error correction method based on natural language processing, includes the following steps:

Step 1: deployment central server and several clients, established in central server preprocessing module, database, Debugging module and correction module, all clients pass through internet communication with central server；

Step 2: multiple urtext being inputted by any client, urtext is sent to center service by client Device, central server stores all urtext into database, and is established in the database for storing and accumulating original The tranining database of beginning text；

Step 3: the urtext in tranining database being classified as right-on text and Error Text, to completely just True text and Error Text is segmented and is divided word processing, according to the errors present and type of error of corpus in urtext Training corpus is marked, setting mark C is represented correctly, and mark R represents redundancy, and mark D represents missing, is marked O and is represented accidentally generation, mark M represents missing；

CRF condition random field is called, and obtains training pattern using training corpus；

Step 4: text to be processed being inputted by any client, client is by File Transfer to be processed to center service Device, the preprocessing module in central server pre-process text to be processed comprising following steps:

Step A1: segmented and divided text to be processed to word processing；

Step A2: by the participle in text to be processed and word is divided to be labeled as testing material；

Step 5: the debugging module in central server carries out debugging to text to be processed, and its step are as follows:

Step B1: debugging is carried out to the testing material in text to be processed according to training pattern and CRF condition random field, is obtained To CRF condition random field debugging result；

Step B2: traversing all scattered strings in text to be processed, carries out n-gram to text to be processed and dissipates string debugging, obtains N-gram dissipates string debugging result；

Step B3: fusion conditions random field debugging result and n-gram dissipate string debugging as a result, carrying out to text to be processed Mark, obtains the final result of text debugging；

Step 6: the final result for the text debugging that step 5 obtains is input to correction module, error correction mould by central server Block carries out error correction to the final result of text debugging, and its step are as follows:

Step C1: building language model corrects missing errors；

Step C2: the word marked containing redundancy error or word are directly deleted；

Step C3: it is corrected using word of the homonym dictionary to the label containing wrong generation in text, completes oneself of text Dynamic error correction；

Step C4: output corrected text；

Step 7: establishing text proofreading main interface, text debugging interface and text error correction interface, center service in client Device transmits the final result of text debugging, corrected text and text to be processed to client, and client is in text school To main interface, text debugging interface and text error correction interface show respectively text to be processed, text debugging final result and entangle Wrong text.

Preferably, when executing step 3, right-on text and Error Text are segmented using the library SnowNLP With divide word processing.

Preferably, when executing step C1, language model selects three gram language models, and three gram language models are expressed as i-th Word w on a position_iWith two word w of front_i-1And w_i-2Related, formula indicates are as follows:

Wherein P indicates conditional probability；S indicates current statement or symbol string；N indicates that front and back character string number takes 3 here.

Preferably, the correction for being carried out missing errors to the final result of text debugging using three gram language models, is needed just The corpus of true marking error type and errors present, its step are as follows:

Step S1: segmenting the final result of the text debugging of input, and finds out missing mark M；

Step S2: the previous and the latter word for marking M is extracted, and is recorded in missing text；

Step S3: traversing three gram language models of building, judge lack text recorded in word whether with dictionary sentence In it is first identical with third word: if it is not the same, then error correction fails, and continues to search next missing label, repeatedly hold Row step S1 to step S3, until missing text is all corrected；If identical, step S5 is executed；

Step S5: judge whether unique: if unique, second word of selected sentence is lacked word；Such as Fruit is not unique, selects second word in the higher sentence of word frequency as lacked word；Error correction success, continues to search next Missing label, repeats step S1 to step S5, until missing text is all corrected.

Preferably, it when executing step C3, is entangled using word of the homonym dictionary to the label containing wrong generation in text Just, its step are as follows:

Step T1: text is segmented, and text after traversal participle is searched accidentally for error label O, marks the previous word of O Belong to accidentally for mistake, which is carried out to the mark of phonetic, and record phonetic；

Step T2: judge to exist in constructed homonym dictionary with the presence or absence of the phonetic: if it does not, illustrating that error correction is lost It loses；If it does, illustrating the mistake pronoun language, there are homonyms, using these homonyms as accidentally for candidate word；

Step T3: all homonyms accidentally for candidate word are successively substituted into prototype statement, calculate separately the probability of sentence simultaneously Arrange in descending order, using sort first sentence in homonym as accidentally pronoun language correction word.

A kind of remote medical consultation with specialists recording text error correction method based on natural language processing of the present invention, solves text The technical issues of automatic errordetecting automatic error-correcting, the present invention carry out debugging and according to the original that specifically malfunctions using the scattered string of CRF and n-gram Because carrying out error correction, very high level is reached in the ability of error correction, and increase in accuracy than the prior art, this hair The bright pressure that can mitigate work to the staff of remote diagnosis section, improves the efficiency of work.

Detailed description of the invention

Fig. 1 is general flow chart of the invention；

Fig. 2 is CRF debugging flow chart of the invention；

Fig. 3 is that n-gama of the invention dissipates string debugging flow chart；

Fig. 4 is the building flow chart of language model of the invention；

Fig. 5 is the building flow chart of homonym dictionary of the invention.

Specific embodiment

A kind of remote medical consultation with specialists recording text error correction method based on natural language processing as Figure 1-Figure 5, including such as Lower step:

Step A1: segmented and divided text to be processed to word processing；

CRF condition random field is a kind of discriminate probabilistic model, using CRF condition random field by text as word sequence or Word sequence is analyzed, with X={ x₁x₂...x_nIndicate the sequence of observations, Y={ y₁y₂...y_nIndicate flag sequence, it has ready conditions The available characteristic function of random field:

Wherein i indicates the position where in observation sequence sentence, z_xIndicate the standardization of the observation sequence marked, p table Show that the probability of prediction error type, λ indicate the weight assigned；Which characteristic function j indicates；F indicates characteristic function；

N-gram dissipate string debugging pass through natural language symbol string in n symbol and meanwhile probability of occurrence statistical data come The structural relation for inferring sentence, uses w_iThe linguistic notation currently to be occurred is indicated, in prediction w_iProbability of occurrence when, need to consider The linguistic notation of front n-1, is formulated as p (w_i|w_i-n+1w_i-n+2w_i-n+3...w_i-1), n takes 3 herein.

Step C1: building language model corrects missing errors；

Step C4: output corrected text；

Claims

1. a kind of remote medical consultation with specialists recording text error correction method based on natural language processing, characterized by the following steps:

Step 1: deployment central server and several clients establish preprocessing module, database, debugging in central server Module and correction module, all clients pass through internet communication with central server；

Step 2: multiple urtext are inputted by any client, urtext is sent to central server by client, Central server stores all urtext into database, and is established in the database for storing and accumulating original text This tranining database；

Step 3: the urtext in tranining database being classified as right-on text and Error Text, to right-on Text and Error Text are segmented and are divided word processing, are marked according to the errors present of corpus in urtext and type of error Training corpus, setting mark C are represented correctly, and mark R represents redundancy, and mark D represents missing, are marked O and are represented accidentally generation, mark M generation Table missing；

Step 4: text to be processed is inputted by any client, client by File Transfer to be processed to central server, in Preprocessing module in central server pre-processes text to be processed comprising following steps:

Step A1: segmented and divided text to be processed to word processing；

Step B1: debugging is carried out to the testing material in text to be processed according to training pattern and CRF condition random field, is obtained CRF condition random field debugging result；

Step B2: traversing all scattered strings in text to be processed, carries out n-gram to text to be processed and dissipates string debugging, obtains n- Gram dissipates string debugging result；

Step B3: fusion conditions random field debugging result and n-gram dissipate string debugging as a result, being labeled to text to be processed, Obtain the final result of text debugging；

Step 6: the final result for the text debugging that step 5 obtains is input to correction module, correction module pair by central server The final result of text debugging carries out error correction, and its step are as follows:

Step C1: building language model corrects missing errors；

Step C3: it is corrected using word of the homonym dictionary to the label containing wrong generation in text, completes entangling automatically for text Wrong function；

Step C4: output corrected text；

Step 7: establishing text proofreading main interface, text debugging interface and text error correction interface in client, central server will Final result, corrected text and the text to be processed of text debugging are transmitted to client, and client is in text proofreading master Interface, text debugging interface and text error correction interface show text to be processed, the final result of text debugging and error correction text respectively This.

2. a kind of remote medical consultation with specialists recording text error correction method based on natural language processing as described in claim 1, feature It is: when executing step 3, is segmented and divided word processing to right-on text and Error Text using the library SnowNLP.

3. a kind of remote medical consultation with specialists recording text error correction method based on natural language processing as described in claim 1, feature Be: when executing step C1, language model selects three gram language models, and three gram language models are expressed as on i-th of position Word w_iWith two word w of front_i-1And w_i-2Related, formula indicates are as follows:

Wherein P indicates conditional probability；S indicates while statement or symbol string；The number of n expression symbol.

4. a kind of remote medical consultation with specialists recording text error correction method based on natural language processing as claimed in claim 3, feature It is: carries out the correction of missing errors to the final result of text debugging using three gram language models, need correct marking error The corpus of type and errors present, its step are as follows:

Step S3: traversing three gram language models of building, whether judges to lack word recorded in text in dictionary sentence the One is identical with third word: if it is not the same, then error correction fails, and continues to search next missing label, repeating step Rapid S1 to step S3, until missing text is all corrected；If identical, step S5 is executed；

Step S5: judge whether unique: if unique, second word of selected sentence is lacked word；If no Uniquely, select second word in the higher sentence of word frequency as lacked word；Error correction success, continues to search next missing Label, repeats step S1 to step S5, until missing text is all corrected.

5. a kind of remote medical consultation with specialists recording text error correction method based on natural language processing as described in claim 1, feature It is: when executing step C3, is corrected using word of the homonym dictionary to the label containing wrong generation in text, step is such as Under:

Step T1: text is segmented, and text after traversal participle is searched accidentally for error label O, the previous word for marking O belongs to Accidentally for mistake, which is carried out to the mark of phonetic, and records phonetic；

Step T2: judge to exist in constructed homonym dictionary with the presence or absence of the phonetic: if it does not, illustrating that error correction fails； If it does, illustrating the mistake pronoun language, there are homonyms, using these homonyms as accidentally for candidate word；

Step T3: all homonyms accidentally for candidate word are successively substituted into prototype statement, calculate separately the probability of sentence and by drop Sequence arrangement, using sort first sentence in homonym as mistake pronoun language correction word.