CN105550173A

CN105550173A - Text correction method and device

Info

Publication number: CN105550173A
Application number: CN201610083955.7A
Authority: CN
Inventors: 刘佳; 俞晓光
Original assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Current assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Priority date: 2016-02-06
Filing date: 2016-02-06
Publication date: 2016-05-04

Abstract

The invention discloses a text correction method and device. The specific implementation scheme includes the steps that text information input by a user is received; at least one error word in the text information is determined through a first language model; candidate words corresponding to the error words are determined on the basis of a preset rule; the candidate words are used for taking the place of the error words to obtain corrected text information. The implementation mode can be used for performing accurate text correction.

Description

Text correction method and device

Technical field

The application relates to field of computer technology, is specifically related to text-processing technical field, particularly relates to text correction method and device.

Background technology

Along with the development of artificial intelligence technology, increasing service or commodity provider start to set up Intelligent Service interactive system, so that 24 hours continually provide required consulting or business service for user.Usual user can be undertaken by inputting word information and this kind of intelligent system alternately.But often can there is because of a variety of causes (such as, misspelling or key errors) situation that the language fails to express the meaning in the Word message that user inputs, and make user cannot obtain effective help information from intelligent system.Therefore, carrying out error correction to the statement that user inputs in intelligent interactive system, is the problem that all intelligent interactive systems all must solve.

In the prior art, mainly through pre-configured error-correction rule, the text of user's input is corrected.Specifically by the word of all hit error-correction rules, can all be corrected to the word after error correction, such as, as long as user's input " coupon ", be just corrected to " reward voucher ".Due to the fixing vocabulary of some can only be comprised in this error-correction rule, therefore can only correct the specific vocabulary of part.Such as, " ipone6 " can only be corrected to " Iphone6 ", and " ipone6 " cannot be corrected to " Iphone ", also cannot correct " iphne6 ", " iphon6 ", " ipone6 " etc., therefore cause the accuracy rate of text correction and recall rate all lower.

Summary of the invention

The object of the application is to propose a kind of text correction method and device, solves the technical matters that above background technology part is mentioned.

First aspect, this application provides a kind of text correction method, and described method comprises: the text message receiving user's input; At least one word of makeing mistakes in described text message is determined by first language model; The candidate word corresponding with described word of makeing mistakes is determined based on pre-defined rule; Use described candidate word replace described in make mistakes word, obtain the text message after correcting.

In certain embodiments, described at least one word of makeing mistakes determined by first language model in described text message, being comprised: the probability of occurrence being calculated each word in described text message by described first language model; According to the probability of occurrence of described each word, determine at least one word of makeing mistakes in described text message.

In certain embodiments, described first language model obtains by the following method: obtain history text information; Pre-service is carried out to described history text information, obtains training sample; Use described training sample to carry out language model training, obtain described first language model; Wherein, described pre-service comprises text filtering, cuts word and extensive.

In certain embodiments, described pre-service also comprises the classification based on type of service; The described training sample of described use carries out language model training, obtain described first language model, comprise: by Recognition with Recurrent Neural Network algorithm, respectively language model training is carried out to sorted training sample, obtain the first language model that each type of service is corresponding; The described probability of occurrence being calculated each word in described text message by described first language model, being comprised: determine the type of service that described text message is corresponding; The probability of occurrence of each word in described text message is calculated by the first language model corresponding with this type of service.

In certain embodiments, described pre-defined rule comprise Pinyin rule, font rule, editing distance regular at least one item; Described text correction method also comprises: if described in make mistakes the corresponding multiple candidate word of word, calculated the occurrence number of each candidate word respectively by second language model; According to the occurrence number of described each candidate word, at least one candidate word undetermined selected from described multiple candidate word; Wherein, described second language model uses described training sample to obtain a gram language model training.

In certain embodiments, the described candidate word of described use replace described in make mistakes word, obtain the text message after correcting, comprising: use respectively each described candidate word undetermined replace described in make mistakes word, obtain at least one text message undetermined; The probability of occurrence of each described text message undetermined is calculated respectively by described first language model; According to the probability of occurrence of described text message undetermined, a text message undetermined is defined as the text message after correcting.

Second aspect, this application provides a kind of text correction device, described device comprises: receiver module, for receiving the text message of user's input; To make mistakes word determination module, for being determined at least one word of makeing mistakes in described text message by first language model; Candidate word determination module, for determining the candidate word corresponding with described word of makeing mistakes based on pre-defined rule; Correction module, for use described candidate word replace described in make mistakes word, obtain the text message after correcting.

In certain embodiments, word determination module of makeing mistakes described in is further used for: the probability of occurrence being calculated each word in described text message by described first language model; According to the probability of occurrence of described each word, determine at least one word of makeing mistakes in described text message.

In certain embodiments, described pre-defined rule comprise Pinyin rule, font rule, editing distance regular at least one item; Described text correction device also comprises: computing module, if for described in make mistakes the corresponding multiple candidate word of word, calculated the occurrence number of each candidate word respectively by second language model; Chosen module, for the occurrence number according to described each candidate word, at least one candidate word undetermined selected from described multiple candidate word; Wherein, described second language model uses described training sample to obtain a gram language model training.

In certain embodiments, described correction module is further used for: use respectively each described candidate word undetermined replace described in make mistakes word, obtain at least one text message undetermined; The probability of occurrence of each described text message undetermined is calculated respectively by described first language model; According to the probability of occurrence of described text message undetermined, a text message undetermined is defined as the text message after correcting.

The text correction method that the application provides and device, first at least one word of makeing mistakes can be determined by the first language model of training in advance from the text message that user inputs, then the candidate word corresponding with word of makeing mistakes is determined according to pre-defined rule, candidate word is finally used to replace word of makeing mistakes, to realize text correction.Text message user view to be expressed can be determined accurately by the language model of training in advance, make the text after correcting accordingly more accurate, thus improve accuracy rate and the recall rate of text correction.

Accompanying drawing explanation

By reading the detailed description done non-limiting example done with reference to the following drawings, the other features, objects and advantages of the application will become more obvious:

Fig. 1 is the exemplary system architecture figure that the application can be applied to wherein;

Fig. 2 is the process flow diagram of an embodiment of the application's text correction method;

Fig. 3 is the process flow diagram of another embodiment of the application's text correction method;

Fig. 4 is the structural representation of an embodiment of the application's text correction device;

Fig. 5 is the structural representation of the computer system be suitable for for the terminal device or server realizing the embodiment of the present application.

Embodiment

Below in conjunction with drawings and Examples, the application is described in further detail.Be understandable that, specific embodiment described herein is only for explaining related invention, but not the restriction to this invention.It also should be noted that, for convenience of description, in accompanying drawing, illustrate only the part relevant to Invention.

It should be noted that, when not conflicting, the embodiment in the application and the feature in embodiment can combine mutually.Below with reference to the accompanying drawings and describe the application in detail in conjunction with the embodiments.

Fig. 1 shows the exemplary system architecture 100 of the embodiment of text correction method or the text correction device can applying the application.

As shown in Figure 1, system architecture 100 can comprise terminal device 101,102,103, network 104 and server 105.Network 104 is in order at terminal device 101, the medium providing communication link between 102,103 and server 105.Network 104 can comprise various connection type, such as wired, wireless communication link or fiber optic cables etc.

User can use terminal device 101,102,103 mutual by network 104 and server 105, to receive or to send message etc.Terminal device 101,102,103 can be provided with the application of various telecommunication customer end, the application of such as ecommerce class, web browser applications, search class application, JICQ, mailbox client, social platform software etc.User can on terminal device 101,102,103 input text information, server 105 can pass through network 104, receive by terminal device 101,102,103 send user input text message.

Terminal device 101,102,103 can be various electronic equipment, include but not limited to smart mobile phone, panel computer, E-book reader, MP3 player (MovingPictureExpertsGroupAudioLayerIII, dynamic image expert compression standard audio frequency aspect 3), MP4 (MovingPictureExpertsGroupAudioLayerIV, dynamic image expert compression standard audio frequency aspect 4) player, pocket computer on knee and desk-top computer etc.

Server 105 can be to provide the server of various service, such as, to the background server that the application that terminal device 101,102,103 runs provides support.

It should be noted that, the text correction method that the embodiment of the present application provides generally is performed by server 105, and correspondingly, text correction device is generally positioned in server 105.

Should be appreciated that, the number of the terminal device in Fig. 1, network and server is only schematic.According to realizing needs, the terminal device of arbitrary number, network and server can be had.

Continue with reference to figure 2, show the flow process 200 of an embodiment of the text correction method according to the application.Described text correction method, comprises the following steps:

Step 201, receives the text message of user's input.

In the present embodiment, first the electronic equipment (server such as shown in Fig. 1) that text correction method is run thereon can pass through wired connection mode or radio connection, receives that sent by terminal device, that user inputs on this terminal device text message.

Particularly, when user wants the provider to commodity or service to carry out commodity or service consultation, the intelligent interactive system that the terminal device of oneself can be used to access this provider run, then in this intelligent interactive system, the problem of seeking advice from or dependent merchandise information are wanted in input.Because these problems or merchandise news can be represented by the form of word, therefore can as the text message of user's input in the present embodiment.

It is pointed out that above-mentioned radio connection can include but not limited to 3G/4G connection, WiFi connection, bluetooth connection, WiMAX connection, Zigbee connection, UWB (ultrawideband) connection and other radio connection developed known or future now.

Step 202, determines at least one word of makeing mistakes in text message by first language model.

In the present embodiment, above-mentioned electronic equipment (server such as shown in Fig. 1), after the text message obtaining user's input, can identify the word of makeing mistakes that may exist in text message by the first language model of training in advance.Language model is widely used in natural language processing field, and it can be expressed as: in word sequence, all words in a given word and context, the probability that this sequence occurs.Particularly, the first language model of the present embodiment can be language model conventional in natural language processing field, such as, N-gram language model or the language model that goes out based on Recognition with Recurrent Neural Network (RecurrentNeuralNetwork, RNN) Algorithm for Training.The first language model trained can be stored in advance in the storage space of above-mentioned electronic equipment, after electronic equipment obtains the text message of user's input, directly can call the first language model deposited to carry out makeing mistakes the identification of word, such as, word lower for those probabilities of occurrence in text message can be defined as word of makeing mistakes by first language model.

Step 203, determines the candidate word corresponding with word of makeing mistakes based on pre-defined rule.

In the present embodiment, text correction method operation electronic equipment thereon, after determining the word of makeing mistakes in text message, can determine the candidate word corresponding with word of makeing mistakes according to predetermined rule further.Alternatively, above-mentioned pre-defined rule can comprise Pinyin rule, font rule, editing distance regular at least one item.

Particularly, when determining the candidate word corresponding with word of makeing mistakes according to Pinyin rule, can by those words identical with the phonetic of word of makeing mistakes alternatively word, such as, the candidate word of " results " correspondence can comprise " receiving ".When determining the candidate word corresponding with word of makeing mistakes according to font rule, can by those word familiar in shape alternatively words with word of makeing mistakes, such as, the candidate word that " volume " is corresponding can comprise " certificate ".When determining the candidate word corresponding with word of makeing mistakes according to editing distance rule, can by those and the letter of word or the close word alternatively word of Pinyin Input order of makeing mistakes, such as, the candidate word that " iphoe6 " is corresponding can comprise " iphone6 ", and the candidate word that " aroused in interest " is corresponding can comprise " thing ".In the present embodiment, can to determine to make mistakes according to above-mentioned one or more rule the candidate word of word.

Step 204, uses candidate word to replace word of makeing mistakes, and obtains the text message after correcting.

In the present embodiment, above-mentioned electronic equipment, after the candidate word of word that obtains makeing mistakes, can directly use candidate word to replace word of makeing mistakes accordingly, thus obtains the text message after correcting.If define multiple word of makeing mistakes in a text message, then can replace each word of makeing mistakes, to obtain the text message after correcting.

In some implementations, if make mistakes a word only corresponding candidate word, then this candidate word can be directly used to replace word of makeing mistakes.In other implementations, if word of makeing mistakes is to there being multiple candidate word, then first can select a candidate word from this multiple candidate word, then use selected candidate word to replace word of makeing mistakes.

The text correction method that above-described embodiment of the application provides, first at least one word of makeing mistakes can be determined by the first language model of training in advance from the text message that user inputs, then the candidate word corresponding with word of makeing mistakes is determined according to pre-defined rule, candidate word is finally used to replace word of makeing mistakes, to realize text correction.Text message user view to be expressed can be determined accurately by the language model of training in advance, make the text after correcting accordingly more accurate, thus improve accuracy rate and the recall rate of text correction.

Further continuation, with reference to figure 3, shows the flow process 300 of another embodiment of the text correction method according to the application.Described text correction method, comprises the following steps:

Step 301, receives the text message of user's input.

In the present embodiment, the electronic equipment (server such as shown in Fig. 1) that text correction method is run thereon can pass through wired connection mode or radio connection, receives that sent by terminal device, that user inputs on this terminal device text message.

Step 302, calculates the probability of occurrence of each word in text message by first language model.

In the present embodiment, above-mentioned electronic equipment (server such as shown in Fig. 1), after the text message obtaining user's input, by the first language model of training in advance, can calculate the probability of occurrence of each word in text message.Particularly, first language model can be expressed as in word sequence, all words in a given word and context, the probability that this sequence occurs.Due to the probability of occurrence of word sequence, with the probability correlation of each word of this word sequence of formation.Therefore, the probability of occurrence of each word also can be calculated by first language model.

In some optional implementations of the present embodiment, above-mentioned first language model can be trained by the following method and be obtained: obtain history text information; Pre-service is carried out to history text information, obtains training sample; Use training sample to carry out language model training, obtain above-mentioned first language model.In the present embodiment, history text information can be the text message that multiple user inputted in the past, such as, can be the history search record of all users in intelligent interactive system.These history text information can be kept on above-mentioned electronic equipment, also can be stored in and can carry out on other the external unit (such as, database or high in the clouds) that communicates with above-mentioned electronic equipment.Above-mentioned electronic equipment, after acquiring these history text information, first can carry out pre-service to history text information.Alternatively, pretreated step can comprise text filtering, cut word and extensive.When carrying out pre-service to text message, first can carry out text filtering, namely filtering out the invalid information in text message, the problem etc. that the problem that such as mess code symbol, web page tag, robot send automatically, test account send.Then cutting word to the text message after filtering, can be specifically multiple words arranged in certain sequence by its cutting according to the semanteme of text.Such as, can with separator as tab ' t ' by each word separately.Finally can do the result of cutting word carry out extensive.Such as, can by unified for the english of input extensive be small English word, by all extensive for numeric string (such as order number or goods number) be " xDIGIT ", by all extensive for web page interlinkage be " URL ", by all extensive for email address be " EMAIL ".After above-mentioned pre-service is carried out to text message, can using pretreated text message as training sample, to conventional basic language model training, then using the model after training as first language model.

In some optional implementations of the present embodiment, above-mentioned pre-service can also comprise the classification based on type of service.Particularly, above-mentioned electronic equipment according to the content of history text information, can also determine the type of service corresponding to it.Then by type of service, all history text information obtained are classified.Wherein, type of service can be the type of the service that service provider provides, and such as, for the operator of telecommunications, type of service can comprise session services class and service on net class etc.; Type of service can also be the type of the commodity that merchandise sales business provides, and such as, for ecommerce class platform, type of service can comprise household appliances and daily necessities etc.

After history text information being classified based on type of service, the history text information of multiple type can be obtained as training sample.Like this, carry out language model training at use training sample, when obtaining first language model, can Recognition with Recurrent Neural Network algorithm be passed through, respectively language model training be carried out to sorted training sample, obtains the first language model that each type of service is corresponding.Particularly, after carrying out above-mentioned pre-service to history text information, if obtain the training sample of N number of type of service, then for each increment originally, Recognition with Recurrent Neural Network Algorithm for Training first language model can be used respectively.Like this, the N number of first language model corresponding respectively with each type of service can just be obtained.Alternatively, the training sample of N number of type of service can also be carried out model training as a total training sample, thus obtain total first training pattern.Like this, by above-mentioned training, N+1 the first language model based on Recognition with Recurrent Neural Network algorithm just can be obtained.Because Recognition with Recurrent Neural Network algorithm can utilize all information above to predict next word fully, and unlike N-gram model, only with utilizing top n word to predict next word.Therefore, the first language model training obtained by Recognition with Recurrent Neural Network Algorithm for Training, can calculate the probability of occurrence of each word in text message more accurately.

Accordingly, step 302 can also comprise: determine the type of service that text message is corresponding; The probability of occurrence of each word in text message is calculated by the first language model corresponding with this type of service.In the present embodiment, when calculating the probability of occurrence of each word in text message, first the type of service corresponding to it can be determined according to the content of text message.Particularly, the keyword of type of service can be mated with content of text, then using the type of service that the matches type of service as text message.Such as, if text message comprises keyword " TV ", and also comprise " TV " as a kind of keyword of household appliances of type of service, then type of service corresponding for text information can be defined as household appliances.Then, the probability of occurrence of each word in text message can be calculated by the first language model corresponding with this type of service.Obtain because first language model is trained respectively for each type of service, therefore when being processed the text message under same type of service by it, result of calculation can be more accurately and reliably.

It should be noted that, when the type of service of text message cannot be determined, by above-mentioned the first total training pattern trained by whole training sample, the probability of occurrence of each word in text message can also be calculated.

In some optional implementations of the present embodiment, before the probability of occurrence being calculated each word in text message by first language model, first can also carry out pre-service to text message, be calculated the probability of occurrence of each word in pretreated text message by first language model.Alternatively, pre-service can comprise text filtering, cut word and extensive.

Step 303, according to the probability of occurrence of each word, determines at least one word of makeing mistakes in text message.

In the present embodiment, above-mentioned electronic equipment, after the probability of occurrence calculating each word, further according to the concrete probable value of each word, can determine one or more words of makeing mistakes of possibility input error in text message.In a kind of possible implementation, if the probability of occurrence of some words is lower than the probability minimal value pre-set in text message, such as 20%, then directly this word can be defined as word of makeing mistakes.In the implementation that another kind is possible, if the probability of occurrence of the next word of some words and this word is all lower than the probability threshold value pre-set in text message, such as 50%, then this word can be defined as word of makeing mistakes.It should be noted that, the concrete value of above-mentioned probability minimal value and probability threshold value, can need sets itself by user according to actual, the application is not restricted this.

Step 304, determines the candidate word corresponding with word of makeing mistakes based on pre-defined rule.

Step 305, if the corresponding multiple candidate word of word of makeing mistakes, calculates the occurrence number of each candidate word respectively by second language model.

In the present embodiment, if more than one of the candidate word corresponding with word of makeing mistakes determined in above-mentioned steps 304, the occurrence number of each candidate word can calculated respectively further by second language model.Second language model uses above-mentioned training sample to obtain a gram language model training.One gram language model is again Unigram model, the special case situation being N-gram model when N gets 1.The probability that certain word occurs in overall language material is only concerned about by Unigram model, and this probability and its word occurred above have nothing to do.Therefore, the occurrence number of the candidate word calculated by second language model, just can represent the probability that this candidate word may occur in text message.

Step 306, according to the occurrence number of each candidate word, at least one candidate word undetermined selected from multiple candidate word.

In the present embodiment, above-mentioned electronic equipment can by the occurrence number of each candidate word, sort from big to small according to its value, then the one or more candidate word come above are chosen to be candidate word undetermined, namely using candidate word larger for those probabilities of occurrence in text message as candidate word undetermined.

Step 307, uses each candidate word undetermined to replace word of makeing mistakes respectively, obtains at least one text message undetermined.

When after above-mentioned electronic equipment within step 306 at least one candidate word undetermined selected, each candidate word undetermined can also be used further to replace word of makeing mistakes in urtext information respectively, obtain at least one text message undetermined.Such as, if corresponding three candidate word undetermined of word of makeing mistakes, then the word that carries out makeing mistakes can obtain three text messages undetermined after replacing.

Step 308, calculates the probability of occurrence of each text message undetermined respectively by first language model.

In the present embodiment, above-mentioned electronic equipment can lead to above-mentioned first language model excessively further, calculates the probability of occurrence of each text message undetermined respectively.Particularly, if include N number of word in text message undetermined, then an end mark can be increased after N number of word, and using this N number of word and end mark jointly as the input of first language model.By first language model, not only can calculate the probability of occurrence of each word, the probability of occurrence of above-mentioned end mark can also be calculated, namely after N number of word, follow the probability of end mark.Like this, can obtain altogether N+1 probable value based on first language model, then this N+1 probable value being multiplied just to obtain the probability of occurrence of text message undetermined.

Step 309, according to the probability of occurrence of text message undetermined, is defined as the text message after correcting by a text message undetermined.

In the present embodiment, above-mentioned electronic equipment is after the probability of occurrence calculating each text message undetermined, first can sort from big to small according to concrete probable value, then that text message undetermined coming foremost is defined as the text message after correcting.

In an optional implementation of the present embodiment, the probability of occurrence of the text message that the probability of occurrence of the text message after correction and user can also be inputted compares, if the probability of occurrence of the text message after correcting is greater than the probability of occurrence of the text message of user's input, then can correct the text message of user's input, namely use the text message after correcting to replace the text message of user's input.If the probability of occurrence of the text message after correcting is less than the probability of occurrence of the text message of user's input, then can not corrects the text message of user's input, namely retain the text message of user's input.

Although it should be noted that the operation describing the inventive method in the accompanying drawings with particular order, this is not that requirement or hint must perform these operations according to this particular order, or must perform the result that all shown operation could realize expectation.On the contrary, the step described in process flow diagram can change execution sequence.Additionally or alternatively, some step can be omitted, multiple step be merged into a step and perform, and/or a step is decomposed into multiple step and perform.

As can be seen from Figure 3, compared with the embodiment that Fig. 2 is corresponding, specifically describe in the present embodiment and how to determine word of makeing mistakes, and how to carry out the method for text correction when the corresponding multiple candidate word of word of makeing mistakes.Specifically can determine by second language model the candidate word undetermined that probability of occurrence is higher in overall language material from multiple candidate word, and then utilize first language model to calculate the probability of occurrence of text message undetermined, finally determine the text message after correcting based on probability of occurrence.By the calculating of two models, the reliability of the accuracy of candidate word and the text after correcting can be improved respectively, thus the accuracy of text correction can be improved on the whole further.

With further reference to Fig. 4, as the realization to method shown in above-mentioned each figure, this application provides an a kind of embodiment of text correction device, this device embodiment is corresponding with the embodiment of the method shown in Fig. 2-3, and this device specifically can be applied in various electronic equipment.

As shown in Figure 4, the text correction device 400 described in the present embodiment comprises: receiver module 410, word determination module 420 of makeing mistakes, candidate word determination module 430 and correction module 440.Wherein, receiver module 410, for receiving the text message of user's input; To make mistakes word determination module 420, for being determined at least one word of makeing mistakes in described text message by first language model; Candidate word determination module 430, for determining the candidate word corresponding with described word of makeing mistakes based on pre-defined rule; Correction module 440, for use described candidate word replace described in make mistakes word, obtain the text message after correcting.

In the present embodiment, the receiver module 410 of text correction device 400 can pass through wired connection mode or radio connection, receives that sent by terminal device, that user inputs on this terminal device text message.

In the present embodiment, the word determination module 420 of makeing mistakes of text correction device 400 can identify the word of makeing mistakes that may exist in text message by the first language model of training in advance.

In the present embodiment, the candidate word determination module 430 of text correction device 400, the candidate word corresponding with word of makeing mistakes can be determined according to predetermined rule, wherein, pre-defined rule can comprise Pinyin rule, font rule, editing distance regular at least one item.

In the present embodiment, the correction module 440 of text correction device 400, the candidate word that can directly use candidate word determination module 430 to determine replaces word of makeing mistakes accordingly, thus obtains the text message after correcting.

In some optional implementations of the present embodiment, word determination module 420 of makeing mistakes is further used for: the probability of occurrence being calculated each word in text message by first language model; According to the probability of occurrence of each word, determine at least one word of makeing mistakes in text message.

In some optional implementations of the present embodiment, first language model obtains by the following method: obtain history text information; Pre-service is carried out to history text information, obtains training sample; Use training sample to carry out language model training, obtain first language model; Wherein, pre-service comprises text filtering, cuts word and extensive.

In some optional implementations of the present embodiment, above-mentioned pre-service also comprises the classification based on type of service.Above-mentioned use training sample carries out language model training, obtains first language model, comprising: by Recognition with Recurrent Neural Network algorithm, carries out language model training respectively to sorted training sample, obtains the first language model that each type of service is corresponding.Accordingly, calculate the probability of occurrence of each word in text message above by first language model, comprising: determine the type of service that text message is corresponding; The probability of occurrence of each word in text message is calculated by the first language model corresponding with this type of service.

In some optional implementations of the present embodiment, above-mentioned pre-defined rule comprise Pinyin rule, font rule, editing distance regular at least one item.Text correction device 400 also comprises: computing module, if for the corresponding multiple candidate word of word of makeing mistakes, calculated the occurrence number of each candidate word by second language model respectively; Chosen module, for the occurrence number according to each candidate word, at least one candidate word undetermined selected from multiple candidate word; Wherein, second language model uses above-mentioned training sample to obtain a gram language model training.

In some optional implementations of the present embodiment, correction module 440 is further used for: use each candidate word undetermined to replace word of makeing mistakes respectively, obtain at least one text message undetermined; The probability of occurrence of each text message undetermined is calculated respectively by first language model; According to the probability of occurrence of text message undetermined, a text message undetermined is defined as the text message after correcting.

It will be understood by those skilled in the art that above-mentioned text correction device 400 also comprises some other known features, such as processor, storeies etc., in order to unnecessarily fuzzy embodiment of the present disclosure, these known structures are not shown in the diagram.

The text correction device that the present embodiment provides, first at least one word of makeing mistakes can be determined by the first language model of training in advance from the text message that user inputs, then the candidate word corresponding with word of makeing mistakes is determined according to pre-defined rule, candidate word is finally used to replace word of makeing mistakes, to realize text correction.Text message user view to be expressed can be determined accurately by the language model of training in advance, make the text after correcting accordingly more accurate, thus improve accuracy rate and the recall rate of text correction.

Below with reference to Fig. 5, it illustrates the structural representation of the computer system 500 of terminal device or the server be suitable for for realizing the embodiment of the present application.

As shown in Figure 5, computer system 500 comprises CPU (central processing unit) (CPU) 501, and it or can be loaded into the program random access storage device (RAM) 503 from storage area 508 and perform various suitable action and process according to the program be stored in ROM (read-only memory) (ROM) 502.In RAM503, also store system 500 and operate required various program and data.CPU501, ROM502 and RAM503 are connected with each other by bus 504.I/O (I/O) interface 505 is also connected to bus 504.

I/O interface 505 is connected to: the importation 506 comprising keyboard, mouse etc. with lower component; Comprise the output 507 of such as cathode-ray tube (CRT) (CRT), liquid crystal display (LCD) etc. and loudspeaker etc.; Comprise the storage area 508 of hard disk etc.; And comprise the communications portion 509 of network interface unit of such as LAN card, modulator-demodular unit etc.Communications portion 509 is via the network executive communication process of such as the Internet.Driver 510 is also connected to I/O interface 505 as required.Detachable media 511, such as disk, CD, magneto-optic disk, semiconductor memory etc., be arranged on driver 510 as required, so that the computer program read from it is mounted into storage area 508 as required.

Especially, according to embodiment of the present disclosure, the process that reference flow sheet describes above may be implemented as computer software programs.Such as, embodiment of the present disclosure comprises a kind of computer program, and it comprises the computer program visibly comprised on a machine-readable medium, and described computer program comprises the program code for the method shown in flowchart.In such embodiments, this computer program can be downloaded and installed from network by communications portion 509, and/or is mounted from detachable media 511.

Especially, according to embodiment of the present disclosure, the process that reference flow sheet describes above may be implemented as computer software programs.Such as, embodiment of the present disclosure comprises and a kind ofly calculates process flow diagram in accompanying drawing and block diagram, illustrates according to the architectural framework in the cards of the system of the various embodiment of the application, method and computer program product, function and operation.In this, each square frame in process flow diagram or block diagram can represent a part for module, program segment or a code, and a part for described module, program segment or code comprises one or more executable instruction for realizing the logic function specified.Also it should be noted that at some as in the realization of replacing, the function marked in square frame also can be different from occurring in sequence of marking in accompanying drawing.Such as, in fact the square frame that two adjoining lands represent can perform substantially concurrently, and they also can perform by contrary order sometimes, and this determines according to involved function.Also it should be noted that, the combination of the square frame in each square frame in block diagram and/or process flow diagram and block diagram and/or process flow diagram, can realize by the special hardware based system of the function put rules into practice or operation, or can realize with the combination of specialized hardware and computer instruction.

Be described in module involved in the embodiment of the present application to be realized by the mode of software, also can be realized by the mode of hardware.Described module also can be arranged within a processor, such as, can be described as: a kind of processor comprises receiver module, word determination module of makeing mistakes, candidate word determination module and correction module.Wherein, the title of these modules does not form the restriction to this module itself under certain conditions, and such as, receiver module can also be described to " for receiving the module of the text message of user's input ".

As another aspect, present invention also provides a kind of non-volatile computer storage medium, this non-volatile computer storage medium can be the non-volatile computer storage medium comprised in device described in above-described embodiment; Also can be individualism, be unkitted the non-volatile computer storage medium allocated in terminal.Above-mentioned non-volatile computer storage medium stores one or more program, when one or more program described is performed by an equipment, makes described equipment: the text message receiving user's input; At least one word of makeing mistakes in described text message is determined by first language model; The candidate word corresponding with described word of makeing mistakes is determined based on pre-defined rule; Use described candidate word replace described in make mistakes word, obtain the text message after correcting.

More than describe and be only the preferred embodiment of the application and the explanation to institute's application technology principle.Those skilled in the art are to be understood that, invention scope involved in the application, be not limited to the technical scheme of the particular combination of above-mentioned technical characteristic, also should be encompassed in when not departing from described inventive concept, other technical scheme of being carried out combination in any by above-mentioned technical characteristic or its equivalent feature and being formed simultaneously.The technical characteristic that such as, disclosed in above-mentioned feature and the application (but being not limited to) has similar functions is replaced mutually and the technical scheme formed.

Claims

1. a text correction method, is characterized in that, comprising:

Receive the text message of user's input;

At least one word of makeing mistakes in described text message is determined by first language model;

The candidate word corresponding with described word of makeing mistakes is determined based on pre-defined rule;

Use described candidate word replace described in make mistakes word, obtain the text message after correcting.

2. text correction method according to claim 1, is characterized in that, described at least one word of makeing mistakes determined by first language model in described text message, being comprised:

The probability of occurrence of each word in described text message is calculated by described first language model;

According to the probability of occurrence of described each word, determine at least one word of makeing mistakes in described text message.

3. text correction method according to claim 2, is characterized in that, described first language model obtains by the following method:

Obtain history text information;

Pre-service is carried out to described history text information, obtains training sample;

Use described training sample to carry out language model training, obtain described first language model;

Wherein, described pre-service comprises text filtering, cuts word and extensive.

4. text correction method according to claim 3, is characterized in that,

Described pre-service also comprises the classification based on type of service;

The described training sample of described use carries out language model training, obtains described first language model, comprising:

By Recognition with Recurrent Neural Network algorithm, respectively language model training is carried out to sorted training sample, obtain the first language model that each type of service is corresponding;

The described probability of occurrence being calculated each word in described text message by described first language model, being comprised:

Determine the type of service that described text message is corresponding;

The probability of occurrence of each word in described text message is calculated by the first language model corresponding with this type of service.

5. text correction method according to claim 4, is characterized in that,

Described pre-defined rule comprise Pinyin rule, font rule, editing distance regular at least one item;

Described text correction method also comprises:

If described in make mistakes the corresponding multiple candidate word of word, calculated the occurrence number of each candidate word respectively by second language model;

According to the occurrence number of described each candidate word, at least one candidate word undetermined selected from described multiple candidate word;

Wherein, described second language model uses described training sample to obtain a gram language model training.

6. text correction method according to claim 5, is characterized in that, word of makeing mistakes described in the described candidate word replacement of described use, obtains the text message after correcting, comprising:

Use respectively each described candidate word undetermined replace described in make mistakes word, obtain at least one text message undetermined;

The probability of occurrence of each described text message undetermined is calculated respectively by described first language model;

According to the probability of occurrence of described text message undetermined, a text message undetermined is defined as the text message after correcting.

7. a text correction device, is characterized in that, comprising:

Receiver module, for receiving the text message of user's input;

To make mistakes word determination module, for being determined at least one word of makeing mistakes in described text message by first language model;

Candidate word determination module, for determining the candidate word corresponding with described word of makeing mistakes based on pre-defined rule;

Correction module, for use described candidate word replace described in make mistakes word, obtain the text message after correcting.

8. text correction device according to claim 7, is characterized in that, described in word determination module of makeing mistakes be further used for:

9. text correction device according to claim 8, is characterized in that, described first language model obtains by the following method:

Obtain history text information;

10. text correction device according to claim 9, is characterized in that,

Determine the type of service that described text message is corresponding;

11. text correction devices according to claim 10, is characterized in that,

Described text correction device also comprises:

Computing module, if for described in make mistakes the corresponding multiple candidate word of word, calculated the occurrence number of each candidate word respectively by second language model;

Chosen module, for the occurrence number according to described each candidate word, at least one candidate word undetermined selected from described multiple candidate word;

12. text correction devices according to claim 11, it is characterized in that, described correction module is further used for: