CN110162767A

CN110162767A - The method and apparatus of text error correction

Info

Publication number: CN110162767A
Application number: CN201810146360.0A
Authority: CN
Inventors: 杨俊�
Original assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Current assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Priority date: 2018-02-12
Filing date: 2018-02-12
Publication date: 2019-08-23

Abstract

The invention discloses the method and apparatus of text error correction, are related to field of computer technology.One specific embodiment of this method includes: to be obtained according to text error correction demand to corrected text；Corrected text is treated according to error correcting model and carries out error correction, is exported to the corresponding received text of corrected text, wherein error correcting model is trained based on the sequence of attention to series model.The embodiment carries out error correction to text based on the sequence of attention to series model using training, reduces the complexity of calculating, improves the accuracy rate of text error correction.

Description

The method and apparatus of text error correction

Technical field

The present invention relates to field of computer technology more particularly to a kind of method and apparatus of text error correction.

Background technique

In recent years, constantly bringing forth new ideas and improving with information technology, intelligent customer service robot by its do not need training, can With long time continuous working and the advantages such as human cost are not needed, in customer service industry gradually instead of artificial customer service.Intelligent customer service The core technology of robot is pre-processed to the input of user, then carries out intention assessment classification to pre-processed results, into And response is carried out according to the result of classification.Text error correction link in preprocessing process is extremely important.If not to the defeated of user Enter to carry out error correction, will affect intention assessment accuracy rate, eventually results in intelligent customer service robot response mistake.

The prior art is that text error correction is carried out based on language model, i.e., the word-based probability of occurrence between word calculates sentence The probability of son.Assuming that sentence s was made of k word, i.e. s=W₁,W₂,…,W_k, (wherein W₁,W₂,…,W_kTo constitute sentence s Word), then the Probability p (s) of sentence s can indicate are as follows:

P (s)=p (W₁,W₂,…,W_k)=p (W₁)p(W₂|W₁)…p(W_k|W₁,W₂,…,W_k-1)

Wherein, W_kIndicate current word, W₁,W₂,…,W_k-1Indicate the word before current word.

In realizing process of the present invention, at least there are the following problems in the prior art: one, prior art base for inventor's discovery In the method that language model carries out text error correction, language model only considers the word before current word, does not account for current word Subsequent word；Two, the language model that the method for prior art text error correction uses has that model complicated calculations amount is big, By taking N-Gram model (common a kind of language model in large vocabulary continuous speech recognition) as an example, it is assumed that the size of vocabulary is 100000, then the number of parameters of N-Gram model just reaches 100000^N, N is bigger, and model is more accurate, but model is also more multiple Miscellaneous calculation amount is bigger.

Summary of the invention

In view of this, the embodiment of the present invention provides a kind of method and apparatus of text error correction, the complexity of calculating can be reduced Degree improves the accuracy rate of text error correction.

To achieve the above object, according to an aspect of an embodiment of the present invention, a kind of method of text error correction is provided.

A kind of method of text error correction of the embodiment of the present invention includes: to be obtained according to text error correction demand to corrected text； Error correction is carried out to corrected text to described according to error correcting model, output is described to the corresponding received text of corrected text, described to entangle Mismatch type is trained based on the sequence of attention to series model.

Optionally, error correction is being carried out to corrected text to described according to error correcting model, output is described corresponding to corrected text Received text before, the method also includes: obtain first sample set and the second sample set, the first sample set includes extremely A few received text, second sample set include Error Text corresponding with the master sample；Utilize first sample This collection and second sample set construct training sample set；The training sample set is trained, to obtain the error correction mould Type, error correcting model input be the training sample set Error Text, output be the training sample set standard Text.

Optionally, it obtains first sample set and the second sample set includes: acquisition first sample set；To the first sample set In received text carry out word segmentation processing and obtain multiple participles, and generate the corresponding replacement of each participle according to default rule Collection；The substitute for selecting predetermined number is concentrated from the corresponding replacement of the participle；According to the substitute to the received text Random replacement generation error text is carried out, then constitutes second sample set using the Error Text.

Optionally, the default rule includes at least one of the following: phase unisonance rule, fuzzy phoneme rule and It is similar to word rule.

Optionally, it is described it is trained based on the sequence of attention to series model include: embeding layer, forward coding Layer, backward coding layer, attention mechanism, decoding layer and conversion layer.

Optionally, the forward coding layer, the backward coding layer and the decoding layer all include long memory network in short-term.

To achieve the above object, according to another aspect of an embodiment of the present invention, a kind of device of text error correction is provided.

A kind of device of text error correction of the embodiment of the present invention, comprising: module is obtained, is used for according to text error correction demand, It obtains to corrected text；Correction module, for carrying out error correction to corrected text to described according to error correcting model, output is described wait entangle The corresponding received text of wrong text, the error correcting model is trained based on the sequence of attention to series model.

Optionally, the acquisition module is also used to: obtaining first sample set and the second sample set, the first sample set packet Containing at least one received text, second sample set includes Error Text corresponding with the master sample；Utilize described One sample set and second sample set construct training sample set；The training sample set is trained, to obtain described entangle Mismatch type, what the error correcting model inputted is the Error Text of the training sample set, and output is the training sample set Received text.

Optionally, the acquisition module is also used to: obtaining first sample set；The received text that the first sample is concentrated It carries out word segmentation processing and obtains multiple participles, and generate the corresponding replacement of each participle according to default rule and collect；From the participle The substitute for selecting predetermined number is concentrated in corresponding replacement；It is raw that random replacement is carried out to the received text according to the substitute At Error Text, second sample set then is constituted using the Error Text.

To achieve the above object, according to an embodiment of the present invention in another aspect, providing a kind of electronic equipment.

The a kind of electronic equipment of the embodiment of the present invention includes: one or more processors；Storage device, for storing one Or multiple programs, when one or more programs are executed by one or more processors, so that one or more processors realize this The method of the text error correction of inventive embodiments.

To achieve the above object, another aspect according to an embodiment of the present invention, provides a kind of computer-readable medium.

A kind of computer-readable medium of the embodiment of the present invention, is stored thereon with computer program, and program is held by processor The method of the text error correction of the embodiment of the present invention is realized when row.

One embodiment in foregoing invention has the following advantages that or the utility model has the advantages that can be using training based on attention Sequence to series model to text carry out error correction, reduce the complexity of calculating, improve the accuracy rate of text error correction；This hair The training sample set formed in bright embodiment to first sample set and the second sample set is trained, to obtain error correcting model, from And can use the sample set data building error correcting model of magnanimity, improve the accuracy of error correcting model；In the embodiment of the present invention Error Text in second sample set is that the received text concentrated according to first sample generates, so as to establish Error Text And its relationship of corresponding received text, further improve the accuracy of error correcting model；From Xiang Tongyin in the embodiment of the present invention Rule, fuzzy phoneme rule and the multiple angles of likeness in form word rule construct the replacement collection of each participle, a variety of so as to comprehensively consider The substitute segmented under situation；It include: forward coding layer and backward coding layer in error correcting model in the embodiment of the present invention, thus It can achieve the relationship for considering current word Yu front word, it is also considered that the effect of current word and the relationship of word below；This hair It include: attention mechanism in error correcting model in bright embodiment, so as to accomplish when generating each output The information for making full use of list entries to carry finds in list entries significantly useful information relevant to output, improves and exports Quality, to improve the accuracy of error correcting model；Forward coding layer, backward coding layer and decoding layer be all in the embodiment of the present invention It may include long memory network in short-term, so as to solve the problems, such as that gradient disappears in trained and identification process, improves error correction mould The accuracy of type improves the accuracy rate of output result.

Further effect possessed by above-mentioned non-usual optional way adds hereinafter in conjunction with specific embodiment With explanation.

Detailed description of the invention

Attached drawing for a better understanding of the present invention, does not constitute an undue limitation on the present invention.Wherein:

Fig. 1 is the schematic diagram of the key step of the method for text error correction according to an embodiment of the present invention；

Fig. 2 is the schematic diagram of the error correcting model of the method for text error correction according to an embodiment of the present invention；

Fig. 3 is the signal of the main flow of the training error correcting model of the method for text error correction according to an embodiment of the present invention Figure；

Fig. 4 is the schematic diagram of the main modular of the device of text error correction according to an embodiment of the present invention；

Fig. 5 is that the embodiment of the present invention can be applied to exemplary system architecture figure therein；

Fig. 6 is adapted for the structural representation of the computer system for the terminal device or server of realizing the embodiment of the present invention Figure.

Specific embodiment

Below in conjunction with attached drawing, an exemplary embodiment of the present invention will be described, including the various of the embodiment of the present invention Details should think them only exemplary to help understanding.Therefore, those of ordinary skill in the art should recognize It arrives, it can be with various changes and modifications are made to the embodiments described herein, without departing from scope and spirit of the present invention.Together Sample, for clarity and conciseness, descriptions of well-known functions and structures are omitted from the following description.

Fig. 1 is the schematic diagram of the key step of the method for text error correction according to an embodiment of the present invention, as shown in Figure 1, this The key step of the method for the text error correction of inventive embodiments may include:

Step S101: it according to text error correction demand, obtains to corrected text.In this step, when receiving text error correction When request, in acquisition request to corrected text.Refer to the problematic Error Text of sentence to corrected text, for example, mistake Word composition, sentence of mistake etc..It can be Chinese text data to corrected text, be also possible to other text datas, this hair It is bright that this is not construed as limiting.To corrected text can be user input writing text data be also possible to user by voice input Voice data convert writing text data for voice data first if voice data, certainly it is of the invention to error correction text Originally it is also possible to the data of other forms, this is not limited by the present invention.

Step S102: corrected text is treated according to error correcting model and carries out error correction, output is to the corresponding standard text of corrected text This.Wherein, error correcting model is trained based on the sequence of attention to series model.It, will be to corrected text in this step It inputs in trained error correcting model, then gets to the corresponding received text of corrected text.Received text, which refers to, to be embodied The correct text data of true intention, for example, " support receives payment to user's input to corrected text by taking intelligent customer service as an example ", the File Transfer that intelligent customer service inputs user can obtain its corresponding received text and " goods be supported to arrive into error correcting model Payment ", therefore intelligent customer service can obtain the true intention of user, and then respond.

In the embodiment of the present invention, error correction is carried out treating corrected text according to error correcting model, output is corresponding to corrected text Received text before, the method for text error correction can also include: to obtain first sample set and the second sample set, first sample set It may include at least one received text, the second sample set may include Error Text corresponding with master sample；Utilize first Sample set and the second sample set construct training sample set；Training sample set is trained, to obtain error correcting model, error correcting model Input be training sample set Error Text, output be training sample set received text.In the present embodiment, first sample Then collection and the second sample set utilize training sample set training error correcting model for generating training sample set.With certain application platform For, the problem of first sample set can be user's FAQs of platform maintenance, these are audited through over cleaning, belongs to high quality Text, not comprising mistake word.Second sample set can be asking comprising erroneous words corresponding with the customer problem of high quality Topic.By counting and accumulating for a long time, the FAQs of platform maintenance belongs to mass data, therefore improves the accurate of model Property.

In the embodiment of the present invention, obtains first sample set and the second sample set may include: acquisition first sample set；To The received text that one sample is concentrated carries out word segmentation processing and obtains multiple participles, and generates each participle according to default rule and correspond to Replacement collection；The substitute for selecting predetermined number is concentrated from corresponding replace is segmented；According to substitute to received text carry out with Machine replaces generation error text, then constitutes second sample set using Error Text.

In this embodiment it is possible to using crawler technology (crawler technology be it is a kind of according to certain rules, automatically grab The program of internet information or the technology of script, have been widely used in internet area) crawl first sample set, it can also be with First sample set is obtained using other methods, this is not limited by the present invention.

In order to make it easy to understand, assuming that " packet postal " is the received text that first sample is concentrated, " packet postal " is carried out first Word segmentation processing obtains " wrapping ", " postal " and " " three participles, generates the replacement collection of these three words respectively, setting the default of " packet " is replaced Changing number is 5, and the default replacement number of " postal " is 3, and the default replacement number of " " is 2, then from the replacement of " packet " Concentration randomly selects 5 substitutes, randomly selects 3 substitutes from the replacement of " postal " concentration, concentrates from the replacement of " " random 2 substitutes are chosen, finally received text " packet postal " is replaced using the substitute chosen, obtains its corresponding 30 Error Text.Therefore, a received text can correspond to multiple Error Texts, each received text and an Error Text group At error correction pair, such received text " packet postal " and its corresponding 30 Error Text can form 30 error correction pair.

In this embodiment, word segmentation processing, available multiple words are carried out to all normative documents that first sample is concentrated Or word, each word and the corresponding replacement collection of word are then calculated, chooses the substitute of each word and word according to preset number later, Then random replacement is carried out to received text and generates its corresponding Error Text, these all Error Texts constitute the second sample This collection.

In the embodiment of the present invention, default rule may include at least one of the following: phase unisonance is regular, fuzzy Sound rule and likeness in form word rule.Phase unisonance rule refers to the identical word of pronunciation, and by taking Chinese as an example, " goods " corresponding substitute can be with It is " obtaining " that " preferential " corresponding substitute can be " secret meeting of lovers ", by taking English as an example, " see " corresponding substitute be can be "sea".Fuzzy phoneme rule refers to the similar word of pronunciation, and by taking Chinese as an example, " zh " corresponding fuzzy phoneme can be " z ", and " ch " is right The fuzzy phoneme answered can be " c ", and " sh " corresponding fuzzy phoneme can be " s ", and " ang " corresponding fuzzy phoneme can be " an ", " eng " corresponding fuzzy phoneme can be " en ", and " ing " corresponding fuzzy phoneme can be " in ", and " n " corresponding fuzzy phoneme can be " l " etc., therefore " only " corresponding substitute can be " taste ", and " knowing " corresponding substitute can be " money road ", with English For " sea " corresponding substitute can be " she ".Likeness in form word rule refers to that the text representation of word is similar, by taking Chinese as an example, " " corresponding substitute can be " oneself ", and " big " corresponding substitute can be " too ", and by taking English as an example, " and " is corresponding Substitute can be " aid ", and " new " corresponding substitute can be " now ".

In this embodiment, by taking Chinese as an example, for phase unisonance and fuzzy phoneme rule, GB2312 (GB2312 can be counted It is based on publication " Chinese Character Set Code for Informati baseset " in 1980, is the China national mark of Chinese information processing Standard is the Chinese character code enforced) first-level Chinese characters in standard and the Chinese characters of level 2, homonym and fuzzy phoneme word are obtained, for It is similar to word rule, the first-level Chinese characters in GB2312 standard and the dot matrix word library of the Chinese characters of level 2 can be counted, counts the phase in dot matrix Like degree, if the similarity between two words is greater than threshold value, then they are the similar words of font.Dot matrix word library is each Chinese Word is all divided into 16 × 16 or 24 × 24 points, then indicates Chinese glyph with the actual situation of each point, is used as display word Library uses.

In the embodiment of the present invention, it is trained based on the sequence of attention to series model may include: embeding layer, it is preceding To coding layer, backward coding layer, attention mechanism, decoding layer and conversion layer.Fig. 2 is text error correction according to an embodiment of the present invention Method error correcting model schematic diagram.As shown in Fig. 2, error correcting model may include: embeding layer (Embed layers), forward coding Layer (FOREncoder layers), backward coding layer (BACKEncoder layers), attention mechanism (Attention mechanism), decoding layer (Decoder layers) and conversion layer (Softmax function).FOREncoder layer in error correcting model and BACKEncoder layers can be with Reach the relationship for considering current word and front word, it is also considered that the effect of current word and the relationship of word below.Attention mechanism It is one of deep learning model, an attention range can be generated when generating output, what next expression exported When to pay close attention to which of list entries part, next output is then generated according to the part of concern, is existed in this way It when generating each output, can accomplish the information for making full use of list entries to carry, find significant in list entries Useful information relevant to output, improves the quality of output, to improve the precision of identification model.

In error correcting model shown in Fig. 2, A, B and C are the inputs of model, refer in the present invention treat corrected text into The multiple participles obtained after row word segmentation processing.Then, term vector is converted by participle by Embed layers, wherein the dimension of term vector Degree be usually concentrate the numbers of all participles to determine by first sample, such as to first sample concentrate all received texts into Row word segmentation processing, obtains multiple participles, then the repetition participle in removal participle encodes remaining participle since 1, encodes Maximum number can be used as the dimension of term vector, the dimension of term vector can also be set in the present invention according to concrete scene certainly Degree, is not construed as limiting this.Then, the term vector of conversion is encoded by FOREncoder layers and BACKEncoder layers.Most The coding result of FOREncoder layers and BACKEncoder layers is merged into intermediate vector S afterwards.At Decoder layers, by centre to The output of amount S and Attention mechanism is parsed as input, obtains output vector output, then passes through conversion layer The word X and Y of maximum probability is calculated in Softmax function.In addition, the start in the model is characterized mark, corresponding output Start, the word of not corresponding output, only a placeholder, end is also characterized mark, and the end of corresponding output is also only One placeholder.

In the embodiment of the present invention, forward coding layer, backward coding layer and decoding layer can include long memory network in short-term. Long memory network LSTM (Long Short-Term Memory) in short-term is a kind of time recurrent neural network, be suitable for processing and Relatively long critical event is spaced and postponed in predicted time sequence, forward coding layer, backward coding in the embodiment of the present invention Layer and decoding layer can include long memory network in short-term, ask so as to solve that gradient in trained and identification process disappears Topic improves the accuracy of error correcting model, improves the accuracy rate of output result.

Fig. 3 is the signal of the main flow of the training error correcting model of the method for text error correction according to an embodiment of the present invention Figure.As shown in figure 3, the main flow of the training error correcting model of the method for the text error correction of the embodiment of the present invention may include: step Rapid S301 obtains first sample set；Step S302 carries out word segmentation processing to the received text that the first sample of acquisition is concentrated and obtains Multiple participles；Step S303 generates the corresponding replacement of each participle according to default rule and collects；Step S304 is corresponded to from participle Replacement concentrate and select the substitute of predetermined number；Step S305 carries out random replacement generation to received text according to substitute Error Text；Step S306 constitutes the second sample set using all Error Texts；Step S307, using first sample set and Second sample set constructs training sample set；Step S308, is trained training sample set, obtains error correcting model.In the present invention, It can use trained error correcting model and treat corrected text progress error correction, generate its corresponding received text.

The technical solution of text error correction according to an embodiment of the present invention can be seen that can be using training based on attention Sequence to series model to text carry out error correction, reduce the complexity of calculating, improve the accuracy rate of text error correction；This hair The training sample set formed in bright embodiment to first sample set and the second sample set is trained, to obtain error correcting model, from And can use the sample set data building error correcting model of magnanimity, improve the accuracy of error correcting model；In the embodiment of the present invention Error Text in second sample set is that the received text concentrated according to first sample generates, so as to establish Error Text And its relationship of corresponding received text, further improve the accuracy of error correcting model；From Xiang Tongyin in the embodiment of the present invention Rule, fuzzy phoneme rule and the multiple angles of likeness in form word rule construct the replacement collection of each participle, a variety of so as to comprehensively consider The substitute segmented under situation；It include: forward coding layer and backward coding layer in error correcting model in the embodiment of the present invention, thus It can achieve the relationship for considering current word Yu front word, it is also considered that the effect of current word and the relationship of word below；This hair It include: attention mechanism in error correcting model in bright embodiment, so as to accomplish when generating each output The information for making full use of list entries to carry finds in list entries significantly useful information relevant to output, improves and exports Quality, to improve the accuracy of error correcting model；Forward coding layer, backward coding layer and decoding layer be all in the embodiment of the present invention It may include long memory network in short-term, so as to solve the problems, such as that gradient disappears in trained and identification process, improves error correction mould The accuracy of type improves the accuracy rate of output result.

Fig. 4 is the schematic diagram of the main modular of the device of text error correction according to an embodiment of the present invention.As shown in figure 4, this The device 400 of the text error correction of inventive embodiments mainly comprises the following modules: obtaining module 401 and correction module 402.

Wherein, obtaining module 401 can be used for being obtained according to text error correction demand to corrected text.Correction module 402 can be used Error correction is carried out in treating corrected text according to error correcting model, is exported to the corresponding received text of corrected text.Error correcting model be through Cross training based on the sequence of attention to series model.

In the embodiment of the present invention, obtaining module 401 can also be used in: obtain first sample set and the second sample set, the first sample This collection may include at least one received text, and the second sample set may include Error Text corresponding with master sample；It utilizes First sample set and the second sample set construct training sample set；Training sample set is trained, to obtain error correcting model, error correction Mode input be training sample set Error Text, output be training sample set received text.

In the embodiment of the present invention, obtaining module 401 can also be used in: obtain first sample set；The mark that first sample is concentrated Quasi- text carries out word segmentation processing and obtains multiple participles, and generates the corresponding replacement of each participle according to default rule and collect；From point The substitute for selecting predetermined number is concentrated in the corresponding replacement of word；Random replacement generation error is carried out to received text according to substitute Then text constitutes the second sample set using Error Text.

In the embodiment of the present invention, default rule may include at least one of the following: phase unisonance is regular, fuzzy Sound rule and likeness in form word rule.Wherein, similar sound rule may include phonetically similar word rule and homonym rule.

In the embodiment of the present invention, it is trained based on the sequence of attention to series model may include: embeding layer, it is preceding To coding layer, backward coding layer, attention mechanism, decoding layer and conversion layer.

In the embodiment of the present invention, forward coding layer, backward coding layer and decoding layer may each comprise long memory network in short-term.

From the above, it can be seen that can be carried out based on the sequence of attention to series model to text using training Error correction reduces the complexity of calculating, improves the accuracy rate of text error correction；To first sample set and in the embodiment of the present invention The training sample set of two sample sets composition is trained, to obtain error correcting model, so as to utilize the sample set data of magnanimity Error correcting model is constructed, the accuracy of error correcting model is improved；Error Text in the embodiment of the present invention in the second sample set is root It is generated according to the received text that first sample is concentrated, so as to establish the relationship of Error Text and its corresponding received text, Further improve the accuracy of error correcting model；From phase unisonance rule, fuzzy phoneme rule and likeness in form word rule in the embodiment of the present invention Then multiple angles construct the replacement collection of each participle, so as to comprehensively consider the substitute segmented under a variety of situations；The present invention Include: forward coding layer and backward coding layer in error correcting model in embodiment, so as to reach consider current word with The relationship of front word, it is also considered that the effect of current word and the relationship of word below；It is wrapped in error correcting model in the embodiment of the present invention It includes: attention mechanism, so as to accomplish the letter for making full use of list entries to carry when generating each output Breath finds in list entries significantly useful information relevant to output, the quality of output is improved, to improve error correcting model Accuracy；Forward coding layer, backward coding layer and decoding layer can include long memory network in short-term in the embodiment of the present invention, from And can solve the problem of gradient disappears in trained and identification process, the accuracy of error correcting model is improved, output result is improved Accuracy rate.

Fig. 5 is shown can be using the exemplary of the device of the method or text error correction of the text error correction of the embodiment of the present invention System architecture 500.

As shown in figure 5, system architecture 500 may include terminal device 501,502,503, network 504 and server 505. Network 504 between terminal device 501,502,503 and server 505 to provide the medium of communication link.Network 504 can be with Including various connection types, such as wired, wireless communication link or fiber optic cables etc..

User can be used terminal device 501,502,503 and be interacted by network 504 with server 505, to receive or send out Send message etc..Various telecommunication customer end applications, such as the application of shopping class, net can be installed on terminal device 501,502,503 (merely illustrative) such as the application of page browsing device, searching class application, instant messaging tools, mailbox client, social platform softwares.

Terminal device 501,502,503 can be the various electronic equipments with display screen and supported web page browsing, packet Include but be not limited to smart phone, tablet computer, pocket computer on knee and desktop computer etc..

Server 505 can be to provide the server of various services, such as utilize terminal device 501,502,503 to user The shopping class website browsed provides the back-stage management server (merely illustrative) supported.Back-stage management server can be to reception To the data such as information query request analyze etc. processing, and by processing result (such as target push information, product letter Breath -- merely illustrative) feed back to terminal device.

It should be noted that the method for text error correction provided by the embodiment of the present invention is generally executed by server 505, phase Ying Di, the device of text error correction are generally positioned in server 505.

It should be understood that the number of terminal device, network and server in Fig. 5 is only schematical.According to realization need It wants, can have any number of terminal device, network and server.

Below with reference to Fig. 6, it illustrates the computer systems 600 for the terminal device for being suitable for being used to realize the embodiment of the present invention Structural schematic diagram.Terminal device shown in Fig. 6 is only an example, function to the embodiment of the present invention and should not use model Shroud carrys out any restrictions.

As shown in fig. 6, computer system 600 includes central processing unit (CPU) 601, it can be read-only according to being stored in Program in memory (ROM) 602 or be loaded into the program in random access storage device (RAM) 603 from storage section 608 and Execute various movements appropriate and processing.In RAM 603, also it is stored with system 600 and operates required various programs and data. CPU 601, ROM 602 and RAM 603 are connected with each other by bus 604.Input/output (I/O) interface 605 is also connected to always Line 604.

I/O interface 605 is connected to lower component: the importation 606 including keyboard, mouse etc.；It is penetrated including such as cathode The output par, c 607 of spool (CRT), liquid crystal display (LCD) etc. and loudspeaker etc.；Storage section 608 including hard disk etc.； And the communications portion 609 of the network interface card including LAN card, modem etc..Communications portion 609 via such as because The network of spy's net executes communication process.Driver 610 is also connected to I/O interface 605 as needed.Detachable media 611, such as Disk, CD, magneto-optic disk, semiconductor memory etc. are mounted on as needed on driver 610, in order to read from thereon Computer program be mounted into storage section 608 as needed.

Particularly, disclosed embodiment, the process described above with reference to flow chart may be implemented as counting according to the present invention Calculation machine software program.For example, embodiment disclosed by the invention includes a kind of computer program product comprising be carried on computer Computer program on readable medium, the computer program include the program code for method shown in execution flow chart.? In such embodiment, which can be downloaded and installed from network by communications portion 609, and/or from can Medium 611 is dismantled to be mounted.When the computer program is executed by central processing unit (CPU) 601, system of the invention is executed The above-mentioned function of middle restriction.

It should be noted that computer-readable medium shown in the present invention can be computer-readable signal media or meter Calculation machine readable storage medium storing program for executing either the two any combination.Computer readable storage medium for example can be --- but not Be limited to --- electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor system, device or device, or any above combination.Meter The more specific example of calculation machine readable storage medium storing program for executing can include but is not limited to: have the electrical connection, just of one or more conducting wires Taking formula computer disk, hard disk, random access storage device (RAM), read-only memory (ROM), erasable type may be programmed read-only storage Device (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light storage device, magnetic memory device, Or above-mentioned any appropriate combination.In the present invention, computer readable storage medium can be it is any include or storage journey The tangible medium of sequence, the program can be commanded execution system, device or device use or in connection.And at this In invention, computer-readable signal media may include in a base band or as carrier wave a part propagate data-signal, Wherein carry computer-readable program code.The data-signal of this propagation can take various forms, including but unlimited In electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be that computer can Any computer-readable medium other than storage medium is read, which can send, propagates or transmit and be used for By the use of instruction execution system, device or device or program in connection.Include on computer-readable medium Program code can transmit with any suitable medium, including but not limited to: wireless, electric wire, optical cable, RF etc. are above-mentioned Any appropriate combination.

Flow chart and block diagram in attached drawing are illustrated according to the system of various embodiments of the invention, method and computer journey The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation A part of one module, program segment or code of table, a part of above-mentioned module, program segment or code include one or more Executable instruction for implementing the specified logical function.It should also be noted that in some implementations as replacements, institute in box The function of mark can also occur in a different order than that indicated in the drawings.For example, two boxes succeedingly indicated are practical On can be basically executed in parallel, they can also be executed in the opposite order sometimes, and this depends on the function involved.Also it wants It is noted that the combination of each box in block diagram or flow chart and the box in block diagram or flow chart, can use and execute rule The dedicated hardware based systems of fixed functions or operations is realized, or can use the group of specialized hardware and computer instruction It closes to realize.

Being described in module involved in the embodiment of the present invention can be realized by way of software, can also be by hard The mode of part is realized.Described module also can be set in the processor, for example, can be described as: a kind of processor packet It includes and obtains module and correction module.Wherein, the title of these modules does not constitute the limit to the module itself under certain conditions It is fixed, for example, obtaining module is also described as " according to text error correction demand, obtaining the module to corrected text ".

As on the other hand, the present invention also provides a kind of computer-readable medium, which be can be Included in equipment described in above-described embodiment；It is also possible to individualism, and without in the supplying equipment.Above-mentioned calculating Machine readable medium carries one or more program, when said one or multiple programs are executed by the equipment, makes Obtaining the equipment includes: to be obtained according to text error correction demand to corrected text；Corrected text is treated according to error correcting model to be entangled Mistake is exported to the corresponding received text of corrected text.Wherein, error correcting model is trained based on the sequence of attention to sequence Column model.

Technical solution according to an embodiment of the present invention, can be using training based on the sequence of attention to series model pair Text carries out error correction, reduces the complexity of calculating, improves the accuracy rate of text error correction；To the first sample in the embodiment of the present invention The training sample set of this collection and the second sample set composition is trained, to obtain error correcting model, so as to utilize the sample of magnanimity This collection data construct error correcting model, improve the accuracy of error correcting model；Mistake in the embodiment of the present invention in the second sample set Text is that the received text concentrated according to first sample generates, so as to establish Error Text and its corresponding received text Relationship, further improve the accuracy of error correcting model；From phase unisonance rule, fuzzy phoneme rule and shape in the embodiment of the present invention The replacement collection of each participle is constructed, like the multiple angles of word rule so as to comprehensively consider the substitute segmented under a variety of situations； Include: forward coding layer and backward coding layer in error correcting model in the embodiment of the present invention, works as so as to reach to consider The relationship of preceding word and front word, it is also considered that the effect of current word and the relationship of word below；Error correction mould in the embodiment of the present invention It include: attention mechanism in type, so as to accomplish that list entries is made full use of to take when generating each output The information of band finds in list entries significantly useful information relevant to output, the quality of output is improved, to improve error correction The accuracy of model；Forward coding layer, backward coding layer and decoding layer can include long short-term memory in the embodiment of the present invention Network improves output so as to solve the problems, such as that gradient disappears in trained and identification process, improves the accuracy of error correcting model As a result accuracy rate.

Above-mentioned specific embodiment, does not constitute a limitation on the scope of protection of the present invention.Those skilled in the art should be bright It is white, design requirement and other factors are depended on, various modifications, combination, sub-portfolio and substitution can occur.It is any Made modifications, equivalent substitutions and improvements etc. within the spirit and principles in the present invention, should be included in the scope of the present invention Within.

Claims

1. a kind of method of text error correction characterized by comprising

According to text error correction demand, obtain to corrected text；

Error correction is carried out to corrected text to described according to error correcting model, output is described to the corresponding received text of corrected text, institute Stating error correcting model is trained based on the sequence of attention to series model.

2. the method according to claim 1, wherein being entangled to described to corrected text according to error correcting model Mistake, output it is described to the corresponding received text of corrected text before, the method also includes:

First sample set and the second sample set are obtained, the first sample set includes at least one received text, second sample This collection includes Error Text corresponding with the master sample；

Training sample set is constructed using the first sample set and second sample set；

The training sample set is trained, to obtain the error correcting model, the error correcting model input is the training The Error Text of sample set, output be the training sample set received text.

3. according to the method described in claim 2, it is characterized in that, acquisition first sample set and the second sample set include:

Obtain first sample set；

Word segmentation processing is carried out to the received text that the first sample is concentrated and obtains multiple participles, and is generated according to default rule It is each to segment corresponding replacement collection；

The substitute for selecting predetermined number is concentrated from the corresponding replacement of the participle；

Random replacement generation error text is carried out to the received text according to the substitute, then utilizes the Error Text Constitute second sample set.

4. according to the method described in claim 3, it is characterized in that, the default rule includes at least one in the following terms : phase unisonance rule, fuzzy phoneme rule and likeness in form word rule.

5. the method according to claim 1, wherein described trained based on the sequence of attention to sequence Model includes: embeding layer, forward coding layer, backward coding layer, attention mechanism, decoding layer and conversion layer.

6. according to the method described in claim 5, it is characterized in that, the forward coding layer, the backward coding layer and described Decoding layer all includes long memory network in short-term.

7. a kind of device of text error correction characterized by comprising

Module is obtained, for obtaining to corrected text according to text error correction demand；

Correction module, for carrying out error correction to corrected text to described according to error correcting model, output is described corresponding to corrected text Received text, the error correcting model is trained based on the sequence of attention to series model.

8. device according to claim 7, which is characterized in that the acquisition module is also used to:

9. device according to claim 8, which is characterized in that the acquisition module is also used to:

Obtain first sample set；

10. device according to claim 9, which is characterized in that the default rule include in the following terms at least One: phase unisonance rule, fuzzy phoneme rule and likeness in form word rule.

11. device according to claim 7, which is characterized in that described trained based on the sequence of attention to sequence Column model includes: embeding layer, forward coding layer, backward coding layer, attention mechanism, decoding layer and conversion layer.

12. device according to claim 11, which is characterized in that the forward coding layer, the backward coding layer and institute Stating decoding layer all includes long memory network in short-term.

13. a kind of electronic equipment characterized by comprising

One or more processors；

Storage device, for storing one or more programs,

When one or more of programs are executed by one or more of processors, so that one or more of processors are real Now such as method as claimed in any one of claims 1 to 6.

14. a kind of computer-readable medium, is stored thereon with computer program, which is characterized in that described program is held by processor Such as method as claimed in any one of claims 1 to 6 is realized when row.