CN110032722A - Text error correction method and device - Google Patents
Text error correction method and device Download PDFInfo
- Publication number
- CN110032722A CN110032722A CN201810030108.3A CN201810030108A CN110032722A CN 110032722 A CN110032722 A CN 110032722A CN 201810030108 A CN201810030108 A CN 201810030108A CN 110032722 A CN110032722 A CN 110032722A
- Authority
- CN
- China
- Prior art keywords
- text
- candidate text
- phonetic
- candidate
- pinyin sequence
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/232—Orthographic correction, e.g. spell checking or vowelisation
Abstract
The invention discloses a kind of text error correction method and devices, are related to field of computer technology.Wherein, this method comprises: obtaining the pinyin sequence to corrected text;Mixing lexicographic tree is searched, to obtain and the matched candidate text set of the pinyin sequence to corrected text;The mixing lexicographic tree includes the corresponding relationship of phonetic and Chinese word and English words;The error correction result to corrected text is determined according to error correcting model and the candidate text set.By above step, the text error correction of Chinese, English, pinyin mixing can be handled well, improves the coverage rate and applicability of text error correction.
Description
Technical field
The present invention relates to field of computer technology more particularly to a kind of text error correction methods and device.
Background technique
In recent years, inquiry error correcting technique is widely applied in searching system, and achieves preferable effect.With mutual
The development for industry of networking inquires error correcting technique in other internet areas (such as electric business field) and also receives more and more passes
Note.
Existing inquiry error correcting technique is broadly divided into following two: text error correction method based on user conversation, based on general
The text error correction method of rate model.In the first text error correction method, the session log mainly searched for according to user is excavated
The candidate error correction pair that user actively rewrites out, and as the correct search term after error correction.In second of text error correction method
In, then the mainly higher user's search term of the amount of will click on calculates candidate text using statistical model as error correction Candidate Set
Probability, and by maximum probability as the correct search term after error correction.
In realizing process of the present invention, at least there are the following problems in the prior art: the first, the prior art for inventor's discovery
The inquiry error correction of Chinese, English, pinyin mixing cannot be handled well;The second, inquiry error correction of the prior art for long-tail word
Processing speed is compared with slow, timeliness is poor.
Summary of the invention
In view of this, the present invention provides a kind of text error correction method and device, Chinese can be handled well, English, is spelled
The text error correction of the mixture of tones improves the coverage rate and applicability of text error correction.
To achieve the above object, according to the first aspect of the invention, a kind of text error correction method is provided.
Text error correction method of the invention includes: the pinyin sequence obtained to corrected text;Mixing lexicographic tree is searched, to obtain
It takes and the matched candidate text set of the pinyin sequence to corrected text;The mixing lexicographic tree include phonetic and Chinese word and
The corresponding relationship of English words;The error correction result to corrected text is determined according to error correcting model and the candidate text set.
Optionally, if the step of pinyin sequence of the acquisition to corrected text include: it is described to corrected text by Chinese character
Composition, then using the phonetic of the Chinese character as the pinyin sequence to corrected text;If described be made of to corrected text non-Chinese character,
Described non-Chinese character itself is then used as to the pinyin sequence to corrected text;If it is described to corrected text by Chinese character and non-Chinese character group
At then by the entirety being made of the phonetic of the Chinese character and the non-Chinese character itself as the pinyin sequence to corrected text;Its
In, the non-Chinese character includes: number, English words and/or phonetic.
Optionally, the lookup mixes lexicographic tree, to obtain and the matched candidate of the pinyin sequence to corrected text
The step of text set include: mixing lexicographic tree is searched based on Forward Maximum Method algorithm and reversed maximum matching algorithm, and according to
Forward Maximum Method result and reversed maximum matching result determination and the matched candidate text set of the pinyin sequence.
Optionally, the described the step of error correction result to corrected text is determined according to error correcting model and candidate text set
It include: the evaluation factor that each candidate text in the candidate text set is calculated separately based on multiple error correcting models;It is commented multiple
Estimate the factor to be merged, to obtain the assessed value of the candidate text;It is determined according to the assessed value described to corrected text
Error correction result.
Optionally, the multiple error correcting model includes following at least two: noisy communication channel error correcting model, editing distance error correction
Model, phonetic are apart from error correcting model.
It optionally, include noisy communication channel error correcting model, editing distance error correcting model and phonetic in the multiple error correcting model
It is described to calculate separately each candidate text in the candidate text set based on multiple error correcting models in the case where error correcting model
Evaluation factor the step of include: that the noisy communication channel probability of the candidate text is calculated based on noisy communication channel error correcting model, and will
Its first evaluation factor as the candidate text;Based on editing distance error correcting model calculate the editor of the candidate text away from
From, and determine according to editing distance the second evaluation factor of the candidate text;Based on phonetic apart from described in error correcting model calculating
The phonetic distance of candidate text, and according to the third evaluation factor of the determining candidate text of phonetic distance.
Optionally, it is described based on phonetic apart from error correcting model calculate the phonetic of the candidate text apart from the step of include:
The word in corrected text and candidate text is treated, comparing its phonetic composition letter one by one, whether identical and tone is identical;
The phonetic distance of each word is determined according to comparison result, and using the adduction of the phonetic distance of each word as the candidate text
This phonetic distance.
Optionally, the Forward Maximum Method result, the reversed maximum matching result include: at least one candidate text
Segment;The method also includes: edit operation is carried out to the pinyin sequence of candidate text fragments;According to edited pinyin sequence
Mixing lexicographic tree is searched, with acquisition and the edited matched newly-increased candidate text fragments of pinyin sequence, and according to described
Candidate text fragments, newly-increased candidate text fragments building and the matched candidate text set of the pinyin sequence to corrected text.
Optionally, the step of pinyin sequence to candidate text fragments carries out edit operation includes: in the candidate
In the case that text fragments include Chinese character, the edit operation of fuzzy phoneme is carried out to the phonetic of the Chinese character;In the candidate text
In the case that segment includes English words, edit operation that the English words are inserted into, replaced, exchanged and/or deleted.
Optionally, the method also includes: the pinyin sequence of training sample word is obtained, and according to the training sample word
Pinyin sequence building mixing lexicographic tree.
Optionally, the method also includes: in the pinyin sequence for obtaining training sample word, and according to the trained sample
Before the step of pinyin sequence building mixing lexicographic tree of this word, source data is cleaned, to obtain the training sample word.
To achieve the above object, according to the second aspect of the invention, a kind of searching method is provided.
Searching method of the invention includes: to receive input text;Determining that the input text is the feelings to corrected text
Under condition, the pinyin sequence of input text is obtained;Mixing lexicographic tree is searched, is matched with obtaining with the pinyin sequence of the input text
Candidate text set;The mixing lexicographic tree includes the corresponding relationship of phonetic and Chinese word and English words;According to error correcting model and
Candidate's text set determines the error correction result of the input text;Search knot is obtained according to the error correction result of the input text
Fruit, and described search result is sent.
To achieve the above object, according to the third aspect of the invention we, a kind of search error correction method is provided.
Search error correction method of the invention includes: to receive input text;Determining that the input text is to corrected text
In the case where, obtain the pinyin sequence of input text;Mixing lexicographic tree is searched, to obtain the pinyin sequence with the input text
Matched candidate's text set;The mixing lexicographic tree includes the corresponding relationship of phonetic and Chinese word and English words;According to error correction mould
Type and the candidate text set determine the error correction result of the input text;The error correction result of the input text is arranged
Sequence, and the error correction result after sequence is sent.
To achieve the above object, according to the fourth aspect of the invention, a kind of text error correction device is provided.
Text error correction device of the invention includes: acquisition module, for obtaining the pinyin sequence to corrected text;Search mould
Block, for searching mixing lexicographic tree, to obtain and the matched candidate text set of the pinyin sequence to corrected text;It is described mixed
Close the corresponding relationship that lexicographic tree includes phonetic and Chinese word and English words;Determining module, for according to error correcting model and the time
Text set is selected to determine the error correction result to corrected text.
Optionally, if it is described obtain module obtain to corrected text pinyin sequence include: it is described to corrected text by the Chinese
Word composition, then the acquisition module is using the phonetic of the Chinese character as the pinyin sequence to corrected text;If described to error correction text
This is made of non-Chinese character, then described non-Chinese character itself is used as the pinyin sequence to corrected text by the acquisition module;If described
It is made of to corrected text Chinese character and non-Chinese character, then the acquisition module will be by the phonetic of the Chinese character and the non-Chinese character itself
The entirety of composition is as the pinyin sequence to corrected text;Wherein, the non-Chinese character includes: number, English words and/or phonetic.
Optionally, the searching module searches mixing lexicographic tree, to obtain and the pinyin sequence to corrected text
The candidate text set matched includes: that the searching module is based on Forward Maximum Method algorithm and the lookup mixing of reversed maximum matching algorithm
Lexicographic tree, and according to Forward Maximum Method result and reversed maximum matching result determination and the matched candidate text of the pinyin sequence
This collection.
Optionally, the determining module determines the entangling to corrected text according to error correcting model and the candidate text set
Wrong result, which includes: the determining module, calculates separately each candidate text in the candidate text set based on multiple error correcting models
Evaluation factor;The determining module merges multiple evaluation factors, to obtain the assessed value of the candidate text;It is described true
Cover half root tuber determines the error correction result to corrected text according to the assessed value.
Optionally, the multiple error correcting model includes following at least two: noisy communication channel error correcting model, editing distance error correction
Model, phonetic are apart from error correcting model.
It optionally, include noisy communication channel error correcting model, editing distance error correcting model and phonetic in the multiple error correcting model
In the case where error correcting model, the determining module is based on multiple error correcting models and calculates separately in the candidate text set each
The evaluation factor of candidate text includes: the noise that the determining module calculates the candidate text based on noisy communication channel error correcting model
Channel probability, and as the first evaluation factor of the candidate text;The determining module is based on editing distance error correction mould
Type calculates the editing distance of the candidate text, and the second evaluation factor of the candidate text is determined according to editing distance;Institute
It states determining module and calculates the phonetic distance of the candidate text apart from error correcting model based on phonetic, and institute is determined according to phonetic distance
State the third evaluation factor of candidate text.
Optionally, the determining module calculates the phonetic distance packet of the candidate text based on phonetic apart from error correcting model
Include: the determining module treats the word in corrected text and candidate text, compare one by one its phonetic composition letter it is whether identical with
And whether tone is identical;The determining module determines the phonetic distance of each word according to comparison result, and by each word
Phonetic distance of the adduction of phonetic distance as the candidate text.
Optionally, the Forward Maximum Method result, the reversed maximum matching result include: at least one candidate text
Segment;Described device further include: editor module carries out edit operation for the pinyin sequence to candidate text fragments;It is described to look into
Module is looked for, is also used to search mixing lexicographic tree according to edited pinyin sequence, to obtain and the edited pinyin sequence
Matched newly-increased candidate text fragments, and constructed with described according to the candidate text fragments, newly-increased candidate text fragments wait entangle
The matched candidate text set of the pinyin sequence of wrong text.
Optionally, it includes: in the time that the editor module, which carries out edit operation to the pinyin sequence of candidate text fragments,
In the case where selecting text fragments to include Chinese character, the editor module carries out the edit operation of fuzzy phoneme to the phonetic of the Chinese character;
In the case where the candidate text fragments include English words, the editor module is inserted into the English words, is replaced, is handed over
The edit operation changed and/or deleted.
Optionally, described device further include: building module, for obtaining the pinyin sequence of training sample word, and according to institute
State the pinyin sequence building mixing lexicographic tree of training sample word.
Optionally, described device further include: cleaning module, for being cleaned to source data, to obtain the trained sample
This word.
To achieve the above object, according to the fifth aspect of the invention, a kind of electronic equipment is provided.
Electronic equipment of the invention, comprising: one or more processors;And storage device, for storing one or more
A program;When one or more of programs are executed by one or more of processors, so that one or more of processing
Device realizes text error correction method of the invention.
To achieve the above object, according to the sixth aspect of the invention, a kind of computer-readable medium is provided.
Computer-readable medium of the invention is stored thereon with computer program, real when described program is executed by processor
Existing text error correction method of the invention.
One embodiment in foregoing invention has the following advantages that or the utility model has the advantages that in embodiments of the present invention, by obtaining
The pinyin sequence to corrected text is taken, searches mixing lexicographic tree to obtain and the matched time of the pinyin sequence to corrected text
Selection sheet calculates the assessed value of the candidate text, and determines the error correction result to corrected text according to the assessed value,
The text error correction of Chinese, English, pinyin mixing can be handled well, improve the coverage rate and applicability of text error correction.
Further effect possessed by above-mentioned non-usual optional way adds hereinafter in conjunction with specific embodiment
With explanation.
Detailed description of the invention
Attached drawing for a better understanding of the present invention, does not constitute an undue limitation on the present invention.Wherein:
Fig. 1 is the key step schematic diagram of text error correction method according to an embodiment of the invention;
Fig. 2 is the key step schematic diagram of text error correction method according to another embodiment of the present invention;
Fig. 3 is the key step schematic diagram of the text error correction method of another embodiment according to the present invention;
Fig. 4 is the schematic diagram of mixing lexicographic tree according to an embodiment of the present invention;
Fig. 5 is the main modular schematic diagram of text error correction device according to an embodiment of the invention;
Fig. 6 is the main modular schematic diagram of text error correction device according to another embodiment of the present invention;
Fig. 7 is that the embodiment of the present invention can be applied to exemplary system architecture figure therein;
Fig. 8 is adapted for the structural schematic diagram for the computer system for realizing the electronic equipment of the embodiment of the present invention.
Specific embodiment
Below in conjunction with attached drawing, an exemplary embodiment of the present invention will be described, including the various of the embodiment of the present invention
Details should think them only exemplary to help understanding.Therefore, those of ordinary skill in the art should recognize
It arrives, it can be with various changes and modifications are made to the embodiments described herein, without departing from scope and spirit of the present invention.Together
Sample, for clarity and conciseness, descriptions of well-known functions and structures are omitted from the following description.
It should be pointed out that in the absence of conflict, the feature in embodiment and embodiment in the present invention can be with
It is combined with each other.
Fig. 1 is the key step schematic diagram of text error correction method according to an embodiment of the invention.As shown in Figure 1, this
The text error correction method of inventive embodiments includes:
Step S101, the pinyin sequence to corrected text is obtained.
Specifically, if step S101 include: it is described be made of to corrected text Chinese character, the phonetic of the Chinese character is made
For the pinyin sequence to corrected text;If described be made of to corrected text non-Chinese character, by described non-Chinese character itself be used as to
The pinyin sequence of corrected text;If described be made of to corrected text Chinese character and non-Chinese character, by by the phonetic of the Chinese character and
The entirety that the non-Chinese character itself is constituted is as the pinyin sequence to corrected text;Wherein, the non-Chinese character includes: number, English
Cliction and/or phonetic.
It is then " nv shi yun to the pinyin sequence of corrected text for example, being " Ms's sport footwear " to corrected text
dong xie".It is then " iphone8 " to the pinyin sequence of corrected text for example, being " iphone8 " to corrected text.For example,
It is " adidas men sport shoes " to corrected text, then is " adidas nan shi yun to the pinyin sequence of corrected text
dong xie”
Step S102, mixing lexicographic tree is searched, to obtain and the matched candidate text of the pinyin sequence to corrected text
This collection.
Wherein, the mixing lexicographic tree includes the corresponding relationship of phonetic and Chinese word and English words.In the mixing dictionary
In tree, each node preserves a character.Also, in the node of the trailing character in storage phonetic, also preserve the phonetic pair
The word answered.Wherein, the corresponding word can be Chinese word or English words.For example, it is assumed that having " hua in mixing lexicographic tree
This phonetic of wei ", then root node is sky, successively stored in each node under root node character " h ", " u ", " a ", " w ",
" e ", " i ", and also storage has " dividing into " and " Huawei " etc. corresponding with the pinyin sequence in the node of storage character " i "
Word.
Illustratively, it is assumed that corrected text be " dividing mobile phone into ", to corrected text pinyin sequence be " hua wei
Shou ji " includes following candidate text by the candidate text set that step S102 is obtained: " Huawei's mobile phone ", " dividing mobile phone into ",
" Huawei's collection " and " dividing collection into ".
Step S103, the error correction result to corrected text is determined according to error correcting model and the candidate text set.
In embodiments of the present invention, by constructing mixing lexicographic tree in advance, and by obtaining the phonetic sequence to corrected text
Column, search mixing lexicographic tree with obtain with the matched candidate text set of the pinyin sequence to corrected text, according to error correction mould
Type and the candidate text set determine the error correction result to corrected text, and it is mixed can to handle Chinese, English, phonetic well
The text error correction of conjunction improves the coverage rate and applicability of text error correction.
Fig. 2 is the key step schematic diagram of text error correction method according to another embodiment of the present invention.As shown in Fig. 2, this
The text error correction method of inventive embodiments includes:
Step S201, the pinyin sequence to corrected text is obtained.
Specifically, if step S201 include: it is described be made of to corrected text Chinese character, the phonetic of the Chinese character is made
For the pinyin sequence to corrected text;If described be made of to corrected text non-Chinese character, by described non-Chinese character itself be used as to
The pinyin sequence of corrected text;If described be made of to corrected text Chinese character and non-Chinese character, by by the phonetic of the Chinese character and
The entirety that the non-Chinese character itself is constituted is as the pinyin sequence to corrected text;Wherein, the non-Chinese character includes: number, English
Cliction and/or phonetic.
Step S202, it is searched based on Forward Maximum Method algorithm and reversed maximum matching algorithm and mixes lexicographic tree, and according to
Forward Maximum Method result and reversed maximum matching result determination and the matched candidate text of the pinyin sequence to corrected text
This collection.
Specifically, in the Forward Maximum Method algorithm and reversed maximum matching algorithm of the embodiment of the present invention: first to institute
It states and carries out cutting to the pinyin sequence of corrected text, then according to the pinyin sequence piece segment search blendword allusion quotation tree after cutting, with
Obtain Forward Maximum Method result and reversed maximum matching result.Then, according to Forward Maximum Method result and reversed maximum
With result determination and the matched candidate text set of the pinyin sequence.Candidate's text set refers to what all candidate texts were constituted
Set.The Forward Maximum Method result or reversed maximum matching result include: at least one candidate text fragments.It is specific next
It says, when matching result only includes a candidate text fragments, candidate's text fragments are that is, with the spelling to corrected text
The candidate text of one of sound sequences match.It, can be to the multiple candidate text when matching result includes multiple candidate text fragments
This segment is spliced, to obtain candidate text.
Illustratively, it is assumed that when corrected text be " sport footwear when female ", to corrected text pinyin sequence be " nv shi
yun dong xie".In Forward Maximum Method algorithm:
1) mixing lexicographic tree is first searched according to " nv shi yun dong xie ".There is " nv shi in lexicographic tree if mixing
This pinyin sequence of yun dong xie ", then successful match, corresponding by " nv shi yun dong xie " in mixing lexicographic tree
Word as candidate text, that is, be used as Forward Maximum Method result.
If 2) mix and " nv shi yun dong xie " this pinyin sequence, forward impelling one be not present in lexicographic tree
A word length, i.e., according to " nv shi yun dong " this pinyin sequence piece segment search blendword allusion quotation tree.If being deposited in mixing lexicographic tree
In " nv shi yun dong " this pinyin sequence segment, then successful match, will mix " nv shi yun in lexicographic tree
The corresponding word of dong " is as candidate text fragments, then according to " xie " this pinyin sequence piece segment search blendword allusion quotation tree.If
It mixes and there is " xie " this pinyin sequence segment in lexicographic tree, then successful match, by " xie " corresponding word in mixing lexicographic tree
As candidate text fragments.In turn, Forward Maximum Method result includes: " nv shi yun dong " corresponding candidate text piece
Section and " xie " corresponding candidate text fragments.
If " nv shi yun dong " this pinyin sequence segment is not present in lexicographic tree 3) mix, iteration executes " past
It is pushed forward into a word length, according to new pinyin sequence piece segment search blendword allusion quotation tree " the step of, until obtaining Forward Maximum Method
As a result.
Illustratively, it is assumed that when corrected text be " sport footwear when female ", to corrected text pinyin sequence be " nv shi
yun dong xie".In reversed maximum matching algorithm:
1) mixing lexicographic tree is first searched according to " nv shi yun dong xie ".There is " nv shi in lexicographic tree if mixing
This pinyin sequence of yun dong xie ", then successful match, will in mixing lexicographic tree " nv shi yun dong xie " this
The corresponding word of pinyin sequence is as candidate text, i.e., as reversed maximum matching result.
If 2) mix and " nv shi yun dong xie " this pinyin sequence is not present in lexicographic tree, one is promoted backward
A word length, i.e., according to " shi yun dong xie " this pinyin sequence piece segment search blendword allusion quotation tree.If mixing in lexicographic tree
In the presence of " shi yun dong xie " this pinyin sequence segment, then successful match, will mix " shi yun dong in lexicographic tree
The corresponding word of xie " is as candidate text fragments, then according to " nv " this pinyin sequence piece segment search blendword allusion quotation tree.If mixed
Close and there is " nv " this pinyin sequence segment in lexicographic tree, then successful match, will in mixing lexicographic tree " nv " corresponding word as
Candidate text fragments.In turn, reversed maximum matching result include: " shi yun dong xie " corresponding candidate text fragments and
" nv " corresponding candidate text fragments.
If " shi yun dong xie " this pinyin sequence segment is not present in lexicographic tree 3) mix, iteration executes
The step of " word length being promoted backward, according to new pinyin sequence piece segment search blendword allusion quotation tree ", until obtaining reversed maximum
Matching result.
Illustratively, it is assumed that when corrected text be " sport footwear when female ", to corrected text pinyin sequence be " nv shi
Yun dong xie ", if Forward Maximum Method algorithm is " nv shi " and " yun dong to the cutting result of the pinyin sequence
xie";And in mixing lexicographic tree, " nvshi " corresponding word is " Ms " and " when female ", " yun dong xie " corresponding word
For " sport footwear ", then candidate text fragments are as follows: " Ms ", " when female " and " sport footwear ".Therefore, it is obtained based on Forward Maximum Method
Candidate text are as follows: " Ms's sport footwear " and " sport footwear when female ".If reversed cutting of the maximum matching algorithm to the pinyin sequence
As a result it is " xie " and " nv shi yun dong ";And in mixing lexicographic tree, " nv shi yun dong " corresponding word " female
Scholar's movement ", " xie " corresponding word are " shoes " and " tool ", then candidate text fragments are " Ms's movement ", " shoes " and " tool ".Therefore,
The candidate text obtained based on reversed maximum matching is " Ms's sport footwear " and " Ms moves tool ".In turn, according to positive maximum
Matching result and reversed maximum matching result obtain with the matched candidate text of the pinyin sequence to corrected text are as follows:
" Ms's sport footwear ", " sport footwear when female " and " Ms moves tool ".
In embodiments of the present invention, by the way that Forward Maximum Method algorithm is respectively adopted, reversed maximum matching algorithm is treated and is entangled
The pinyin sequence of wrong text carries out cutting, matching, can not only accelerate the text error correction to corrected text (especially long-tail word)
Speed guarantees the timeliness of text error correction;And it can be improved the accuracy rate and coverage rate of text error correction.
Step S203, calculated separately based on multiple error correcting models the assessment of each candidate text in the candidate text set because
Son.
Wherein, the multiple error correcting model may include following at least two: noisy communication channel error correcting model, editing distance error correction
Model, phonetic are apart from error correcting model.
For example, the multiple error correcting model is entangled by noisy communication channel error correcting model and editing distance in an alternative embodiment
Mismatch type composition;Evaluation factor based on the candidate text that the noisy communication channel error correcting model obtains are as follows: the noise of candidate text
Channel probability;Evaluation factor based on the candidate text that the editing distance error correcting model obtains are as follows: the editor of candidate text away from
From.
In another alternative embodiment, the multiple error correcting model is by noisy communication channel error correcting model, editing distance error correction mould
Type and phonetic are formed apart from error correcting model;The evaluation factor of the candidate text obtained based on the noisy communication channel error correcting model is waits
The noisy communication channel probability of selection sheet;The evaluation factor of the candidate text obtained based on the editing distance error correcting model is candidate text
This editing distance;The evaluation factor of the candidate text obtained based on the phonetic apart from error correcting model is the phonetic of candidate text
Distance.
Step S204, multiple evaluation factors are merged, to obtain the assessed value of the candidate text.
Step S205, the error correction result to corrected text is determined according to the assessed value.
It illustratively, can be using the maximum candidate text of assessed value as the error correction result to corrected text.Alternatively,
Assessed value can also be greater than to the candidate text of one or more of a certain preset threshold as the error correction knot to corrected text
Fruit.
In embodiments of the present invention, multiple evaluation factors are concurrently calculated by multiple error correcting models, and by multiple assessments
The factor is merged to obtain the assessed value of candidate text, determines the step such as error correction result to corrected text according to the assessed value
Suddenly, the accuracy rate of inquiry error correction can not only be improved, and can be improved the processing speed of text error correction method, guarantees timeliness
Property.In embodiments of the present invention, it by step S201 to step S205, can handle well Chinese, English, pinyin mixing
Text error correction improves the coverage rate and applicability of text error correction.
Fig. 3 is the key step schematic diagram of the text error correction method of another embodiment according to the present invention.As shown in figure 3, this
The text error correction method of inventive embodiments includes:
Step S301, source data is cleaned, to obtain the training sample word.
Illustratively, the source data can include: user searches for daily record data, commodity title data etc..In an optional reality
It applies in example, searching for daily record data to user can clean as follows:
1) confidence level of search term is calculated, and the search term that confidence level is less than preset threshold is filtered out.
Illustratively, the pv (searching times), ctr (click volume) and gmv (gross turnover) of search term can be first counted, so
The confidence level of search term is calculated according to these three indexs afterwards, calculation formula is as follows:
Confidence=a*pv+b*ctr+c*gmv
Wherein, confidence indicates the confidence level of search term, and a, b and c are preset constant coefficient, and pv indicates search time
Number, ctr indicate click volume, and gmv indicates gross turnover.
Further, in this example, it to the search term for not including Chinese character and including the search term of Chinese character, can be respectively set
Different preset thresholds.For example, can will not include that the preset threshold of search term of Chinese character is set as 500, by the search including Chinese character
The preset threshold of word is set as 10.
2) search term searched in daily record data to user segments, and retains length less than or equal to the first length threshold
It is worth the pure Chinese word of (such as 5), and retains length and be located at the second length threshold (such as 2) and third length threshold (such as 10)
Between non-pure Chinese word.In the case where not influencing present invention implementation, the first, second and third length threshold can according to demand flexibly
Setting.
3) search term including phonetic can be filtered out based on dictionary.
4) search term being made of pure digi-tal is filtered out, the search term including spcial character is filtered out.
By above step, wrong word, long-tail word of user's input etc. can be filtered out as far as possible, reduce making an uproar in training sample word
Sound.
Further, in the alternative embodiment, commodity title data can be cleaned as follows: using based on left and right entropy
New word discovery algorithm excavates the neologisms in commodity title data;And by the way that some rules are arranged (for example, removal is made of pure digi-tal
Commodity title etc.) neologisms Result is filtered.
Step S302, the pinyin sequence of training sample word is obtained, and is constructed according to the pinyin sequence of the training sample word
Mix lexicographic tree.
Specifically, which includes: the pinyin sequence of each word (i.e. training sample word) in the data after obtaining cleaning, so
Each character of the pinyin sequence is from top to bottom sequentially placed into the child node under root node afterwards.Also, by same phonetic
The corresponding all training sample words of sequence are put into the child node for having the trailing character of the pinyin sequence.
For example, it is assumed that the pinyin sequence of training sample word is " hua wei ", the corresponding all training samples of the pinyin sequence
Word is " Huawei " and " dividing into ", then root node can be set as to empty, from top to bottom successively put " h ", " u ", " a ", " w ", " e ", " i "
Enter in the child node of the root node.Also, " Huawei " and " dividing into " is put into the child node for having " i ".Implement in the present invention
In example, lexicographic tree is mixed by building, can support the inquiry error correction of processing Chinese, English, pinyin mixing well.
Step S303, the pinyin sequence to corrected text is obtained.
How to implement about the step, can refer to the related description in embodiment illustrated in fig. 2 about step S201.
Step S304, it is searched based on Forward Maximum Method algorithm and reversed maximum matching algorithm and mixes lexicographic tree, and according to
Forward Maximum Method result and reversed maximum matching result determination and the matched candidate text of the pinyin sequence to corrected text
This collection.
Wherein, the candidate text set is the set that all candidate texts are constituted.The Forward Maximum Method result is anti-
It include: at least one candidate text fragments to maximum matching result.Specifically, when matching result only includes a candidate text
When segment, candidate's text fragments are that is, with the pinyin sequence matched one candidate text to corrected text.Work as matching
When as a result including multiple candidate text fragments, the multiple candidate text fragments can be spliced, to obtain candidate text.It closes
How to implement in the step, can refer to the related description in embodiment illustrated in fig. 2 about step S202.
Further, in order to improve the accuracy rate and coverage rate of text error correction, the text error correction method of the embodiment of the present invention is also
It can comprise the following steps that and edit operation is carried out to the pinyin sequence of the candidate text fragments obtained by step S304;According to volume
Pinyin sequence after volume searches mixing lexicographic tree, to obtain and the edited matched newly-increased candidate text piece of pinyin sequence
Section, and matched according to the candidate text fragments, newly-increased candidate text fragments building with the pinyin sequence to corrected text
Candidate text set.
For example, when corrected text be " sport footwear when female ", the candidate text fragments obtained based on Forward Maximum Method result
Are as follows: " Ms ", " when female ", " sport footwear ", the candidate text fragments obtained based on reversed maximum matching result are as follows: " Ms's fortune
It is dynamic ", " shoes ", " tool ", the newly-increased candidate text fragments obtained by edit operation are as follows: following candidate text then can be obtained in " Lv Shi "
This: " Ms's sport footwear ", " sport footwear when female ", " Lv Shi sport footwear ", " Ms moves tool ".
Specifically, the pinyin sequence to candidate text fragments carries out edit operation and includes:
Step A, in the case where the candidate text fragments include Chinese character, fuzzy phoneme is carried out to the phonetic of the Chinese character
Edit operation.
Wherein, the edit operation of the fuzzy phoneme can include: the edit operation of front and back nasal sound, such as an and ang conversion,
The conversion of ian and iang, the conversion of uan and uang, en and the conversion of eng, the conversion of uen and ueng, the conversion of in and ing;
The edit operation of flat cacuminal, such as the conversion of z and zh, the conversion of c and ch, the conversion of s and sh;The conversion of north and south sound, such as n
The conversion of conversion, b and p, the conversion of h and f, u and the conversion of v, the conversion of i and u, the conversion of i and v with l.For example, candidate text
This segment is " Lv Shi ", carries out edit operation to the phonetic " lv shi " of " Lv Shi ", obtained edited pinyin sequence is " nv
shi”。
Step B, in the case where the candidate text fragments include English words, the English words are inserted into, are replaced,
Exchange and/or the edit operation deleted.
In embodiments of the present invention, it can be realized the editor behaviour to the pinyin sequence of candidate text fragments by step A, B
Make;By obtain with the matched newly-increased candidate text fragments of edited pinyin sequence, and according to the candidate text fragments and
Newly-increased candidate's text fragments building and the matched candidate text of the pinyin sequence to corrected text, are capable of increasing candidate text
Quantity, improve text error correction coverage rate.
Step S305, the noisy communication channel probability of the candidate text is calculated based on noisy communication channel error correcting model, and is made
For the first evaluation factor of the candidate text.
Specifically, the noisy communication channel probability of candidate text can be calculated according to the following formula:
P=P (q/c) * P (c);
Wherein, P is the noisy communication channel probability of candidate text, and q indicates that, to corrected text, c indicates candidate text, P (q/c) table
Show candidate text and to the transition probability between corrected text, P (c) indicates the prior probability of candidate text.
Further, P (q/c), P (c) can be calculated according to the following formula:
Wherein, freq (c) indicates that frequency of occurrence of the candidate text c in training corpus, freq (q, c) are indicated to error correction term
The frequency occurred simultaneously in training corpus with candidate text, | C | indicate that the sum of all words in library is expected in training.
Step S306, calculate the editing distance of the candidate text based on editing distance error correcting model, and according to editor away from
Second evaluation factor of candidate's text described from determination.
Specifically, the editing distance of candidate text refers to;Minimum needed for becoming candidate text to corrected text is compiled
Collect number of operations.Wherein, edit operation can be insertion, replacement, exchange or deletion.For example, being " by machine " to corrected text, wait
Selection sheet is " mobile phone ", then the editing distance of candidate text is 1.For example, being " iphoe " to corrected text, candidate text is
" iphone ", then the editing distance of candidate text is 1.
Optionally, the second evaluation factor of the candidate text meets:
Wherein, μeditIndicate the second evaluation factor of candidate text, deditIndicate the editing distance of candidate text, max { L1,
L2Indicate to take maximum word length, L in corrected text and candidate text1Indicate the word length to corrected text, L2Table
Show the word length of candidate text.
Step S307, calculate the phonetic distance of the candidate text apart from error correcting model based on phonetic, and according to phonetic away from
The third evaluation factor of candidate's text described from determination.
Specifically, the phonetic distance of the candidate text can calculate as follows: treat corrected text and candidate
Word in text, comparing its phonetic composition letter one by one, whether identical and tone is identical;It is determined according to comparison result every
The phonetic distance of a word, and the phonetic distance by the adduction of the phonetic distance of each word as the candidate text.Wherein,
When including non-Chinese character part (such as English words, number) in corrected text and candidate text, it is believed that the non-Chinese character part
The phonetic composition letter of middle same position identical characters is identical, tone is also identical, it is believed that same position in the non-Chinese character part
The phonetic composition letter of kinds of characters is different, tone is also different.
For example, be " by machine " to corrected text, candidate text be " mobile phone ", treat corrected text and candidate text progress by
Word compares.The phonetic of " by " and " hand " composition letter is all " shou ", but the tone of the two is different, therefore the phonetic of first character
Distance are as follows: 1 (phonetic is identical)+0 (tone is different)=1.The phonetic composition letter of " machine " and " machine " is all " ji ", and the sound of the two
Phase modulation is same, therefore the phonetic distance of second word are as follows: 1 (phonetic is identical)+1 (tone is identical)=2.Therefore, candidate text " hand
The phonetic distance of machine " is 3.
For example, be " ipd " to corrected text, candidate text be " ipad ", treat corrected text and candidate text progress by
Word compares.It is identical with the phonetic composition letter of first character " i " in candidate text to corrected text, and tone is identical, so
The phonetic distance of first character is 2.It is identical with the phonetic of second word " p " in candidate text to corrected text, and the two
Tone is identical, so the phonetic distance of second word is 2.To the in the third word " d " and candidate text in corrected text
The phonetic composition letter of three words " a " is different, tone is different, so triliteral phonetic distance is 0.To in corrected text
4th word is sky, and the 4th word in candidate text is " d ", and the phonetic composition letter and tone of the two are all different, so the
The phonetic distance of four words is 0, therefore the phonetic distance of candidate text " ipad " is 4.
Optionally, the third evaluation factor of the candidate text meets:
Wherein, νpinyinIndicate the third evaluation factor of candidate text, dpinyinIndicate the phonetic distance of candidate text, max
{L1,L2Indicate to take maximum word length, L in corrected text and candidate text1Indicate the word length to corrected text,
L2Indicate the word length of candidate text.
Step S308, the first, second and third evaluation factor is merged, to obtain the assessed value of the candidate text.
Optionally, the first, second and third evaluation factor can be merged according to the following formula:
Score=a1*P+b1*μedit+c1*νpinyin;
Wherein, Score indicates the assessed value of candidate text, a1、b1、c1For preset constant coefficient, P is the first evaluation factor,
μeditFor the second evaluation factor, νpinyinFor third evaluation factor.
Step S309, using the maximum candidate text of assessed value as the error correction result to corrected text.
In embodiments of the present invention, it is entangled by using noisy communication channel error correcting model, editing distance error correcting model, phonetic distance
Mismatch type calculates separately the first, second and third evaluation factor, and is merged to the first, second and third evaluation factor to obtain candidate text
Assessed value, can further increase the accuracy rate of text error correction.
It is schematically illustrated below with reference to structure of the Fig. 4 to the mixing lexicographic tree of the embodiment of the present invention.As shown in figure 4, this
The mixing lexicographic tree of inventive embodiments includes the corresponding relationship of phonetic and Chinese word and English words.Specifically, the present invention is implemented
The mixing lexicographic tree of example includes mulitpath, and each path includes the child node under root node and root node.Wherein, root section
Point is sky, has a character in each child node under root node.Also, storage has specific in certain child nodes
The corresponding Chinese word of phonetic (phonetic being made of character of the root node into the child node) or English words.
For example, the paths in Fig. 4 from top to bottom include: root node, the child node for storing " h ", the son section for storing " u "
Point, the child node of storage " a ", the child node for storing " w ", the child node of storage " e ", the child node for storing " i ".Also, it is depositing
In the child node for putting " u ", also there are the corresponding word " Hu " of " hu " this pinyin sequence, " tiger " etc.;In the child node of storage " a "
In, also there are the corresponding word " China " of " hua " this pinyin sequence, " flower " etc.;In the child node of storage " i ", also there is " hua
The corresponding word of this pinyin sequence of wei " " Huawei ", " dividing into " etc..
In addition, the present invention also provides a kind of searching methods.The searching method of the embodiment of the present invention includes:
Step 1: receiving input text.
Step 2: obtaining the pinyin sequence of input text in the case where determining the input text is to corrected text.
In this step, can using all input texts as to corrected text, can also only will part input text as
To corrected text.For example, frequent fault text list can be preset, and the input text of user's input be present in it is described often
When seeing Error Text list, the input text is determined as to corrected text.
Step 3: mixing lexicographic tree is searched, to obtain and the matched candidate text set of the pinyin sequence of the input text.
The mixing lexicographic tree includes the corresponding relationship of phonetic and Chinese character and English.
Step 4: determining the error correction result of the input text according to error correcting model and the candidate text set.
Step 5: obtaining search result according to the error correction result of the input text, and described search result is sent out
It send.
When it is implemented, described search result can be sent to the user terminal, and by user terminal to described search knot
Fruit is shown.
In embodiments of the present invention, Chinese search word, English search term, phonetic can be supported by above step well
The error correction for the search term that search term and the English phonetic three of Chinese arbitrarily mix, improves and covers to search term progress error correction
Lid rate and applicability, and then better understood when the search intention of user, improve user experience.
In addition, the present invention also provides a kind of search error correction methods.The search error correction method of the embodiment of the present invention includes:
Step 1: receiving input text.
Step 2: obtaining the pinyin sequence of input text in the case where determining the input text is to corrected text.
In this step, can using all input texts as to corrected text, can also only will part input text as
To corrected text.For example, frequent fault text list can be preset, and the input text of user's input be present in it is described often
When seeing Error Text list, the input text is determined as to corrected text.
Step 3: mixing lexicographic tree is searched, to obtain and the matched candidate text set of the pinyin sequence of the input text.
The mixing lexicographic tree includes the corresponding relationship of phonetic and Chinese character and English.
Step 4: determining the error correction result of the input text according to error correcting model and the candidate text set.
Step 5: being ranked up to the error correction result of the input text, and the error correction result after sequence is sent.
When it is implemented, in the case where obtained error correction result is multiple, it can be according to being obtained in text error correction method
The assessed value error correction result is ranked up, and the error correction result after sequence is sent to the user terminal.User terminal exists
After error correction result after receiving the sequence, the error correction result after the sequence can be shown by way of signal language
To user.
In embodiments of the present invention, Chinese search word, English search term, phonetic can be supported by above step well
The error correction for the search term that search term and the English phonetic three of Chinese arbitrarily mix, improves and covers to search term progress error correction
Lid rate and applicability, and then better understood when the search intention of user, improve user experience.
Fig. 5 is the main modular schematic diagram of text error correction device according to an embodiment of the invention.As shown in figure 5, this
The text error correction device 500 of inventive embodiments includes: to obtain module 501, searching module 502, determining module 503.
Module 501 is obtained, for obtaining the pinyin sequence to corrected text.
Specifically, if obtain module 501 obtain to corrected text pinyin sequence include: it is described to corrected text by the Chinese
Word composition, then obtain module 501 using the phonetic of the Chinese character as the pinyin sequence to corrected text;If described to corrected text
It is made of non-Chinese character, then obtains module 501 and described non-Chinese character itself is used as to pinyin sequence to corrected text;If described wait entangle
Wrong text is made of Chinese character and non-Chinese character, then obtaining module 501 will be made of the phonetic of the Chinese character and the non-Chinese character itself
Entirety as the pinyin sequence to corrected text;Wherein, the non-Chinese character includes: number, English words and/or phonetic.
Searching module 502, it is matched with the pinyin sequence to corrected text to obtain for searching mixing lexicographic tree
Candidate text set.
Wherein, the mixing lexicographic tree includes the corresponding relationship of phonetic and Chinese word and English words.In the mixing dictionary
In tree, each node preserves a character.Also, in the node of the trailing character in storage pinyin sequence, also preserve the spelling
The corresponding all words of sound sequence.Wherein, the corresponding word can be Chinese word or English words.
Illustratively, it is assumed that corrected text be " dividing mobile phone into ", to corrected text pinyin sequence be " hua wei
Shou ji ", include following candidate text by searching for the candidate text set that module 502 obtains: " Huawei's mobile phone " " divides hand into
Machine ", " Huawei's collection " and " dividing collection into ".
Determining module 503, for determining the error correction to corrected text according to error correcting model and the candidate text set
As a result.
In embodiments of the present invention, it by constructing mixing lexicographic tree in advance, and is obtained by obtaining module to corrected text
Pinyin sequence;Mixing lexicographic tree is searched by searching for module to obtain and the matched time of the pinyin sequence to corrected text
Select text set;The error correction knot to corrected text is determined according to error correcting model and the candidate text set by determining module
Fruit can handle the text error correction of Chinese, English, pinyin mixing well, improve the coverage rate and applicability of text error correction.
Fig. 6 is the main modular schematic diagram of text error correction device according to another embodiment of the present invention.As shown in fig. 6, this
The text error correction device 600 of inventive embodiments includes: cleaning module 601, building module 602, obtains module 603, searching module
604 and determining module 605.
Cleaning module 601, for being cleaned to source data, to obtain the training sample word.
Illustratively, the source data can include: user searches for daily record data, commodity title data etc..About cleaning mould
How block 601 cleans source data, can refer to the related content in embodiment illustrated in fig. 3 about data cleansing.
Module 602 is constructed, for obtaining the pinyin sequence of training sample word, and according to the phonetic sequence of the training sample word
Column building mixing lexicographic tree, may particularly include: building module 602 obtains each word (i.e. training sample word) in the data after cleaning
Pinyin sequence, then each character of the pinyin sequence is from top to bottom sequentially placed into the child node under root node.And
And the corresponding all training sample words of same pinyin sequence are put into the trailing character for having the pinyin sequence by building module 602
In child node.
For example, it is assumed that the pinyin sequence of training sample word is " hua wei ", the corresponding all training samples of the pinyin sequence
Word is " Huawei " and " dividing into ", then root node can be set as to empty, from top to bottom successively put " h ", " u ", " a ", " w ", " e ", " i "
Enter in the child node of the root node.Also, " Huawei " and " dividing into " is put into the child node for having " i ".Implement in the present invention
In example, by the building building mixing lexicographic tree of module 602, the inquiry of processing Chinese, English, pinyin mixing can be supported well
Error correction.
Module 603 is obtained, for obtaining the pinyin sequence to corrected text.
About the specific pinyin sequence how obtained to corrected text of module 603 is obtained, can refer in embodiment illustrated in fig. 5
About the related description for obtaining module 501.
Searching module 604, for searching mixing lexicographic tree based on Forward Maximum Method algorithm and reversed maximum matching algorithm,
And it is matched with the pinyin sequence to corrected text according to Forward Maximum Method result and reversed maximum matching result determination
Candidate text set.
Specifically, in Forward Maximum Method algorithm and reversed maximum matching algorithm: searching module 604 first to it is described to
The pinyin sequence of corrected text carries out cutting, then according to the pinyin sequence piece segment search blendword allusion quotation tree after cutting, to obtain
Forward Maximum Method result and reversed maximum matching result.Then, searching module 604 is according to Forward Maximum Method result and reversed
Maximum matching result determination and the matched candidate text of the pinyin sequence.Wherein, the Forward Maximum Method result or reversed
Maximum matching result includes: at least one candidate text fragments.When matching result includes a candidate text fragments, the candidate
Pinyin sequence of the text fragments as and to corrected text matched one candidate text.When matching result includes multiple candidate texts
When this segment, the multiple candidate text fragments can be spliced, to obtain candidate text.
For example, when corrected text be " sport footwear when female ", the candidate text fragments obtained based on Forward Maximum Method result
For " Ms ", " when female " and " sport footwear ", the candidate text fragments obtained based on reversed maximum matching result be " Ms's movement ",
" shoes " and " tool ", then it is sliceable to obtain following candidate text: " Ms's sport footwear ", " sport footwear when female " and " Ms moves tool ".
In embodiments of the present invention, searching module 604 is by being respectively adopted Forward Maximum Method algorithm, reversed maximum matching
The pinyin sequence that algorithm treats corrected text carries out cutting, matching, can not only accelerate to corrected text (especially long-tail word)
Error correction speed, guarantee text error correction timeliness;And it can be improved the accuracy rate and coverage rate of text error correction.
Further, in order to improve the accuracy rate and coverage rate of text error correction, text error correction device 600 may also include that editor
Module.The editor module carries out edit operation for the pinyin sequence to candidate text fragments.Also, in the optional implementation
Example in, searching module 604 be also used to according to edited pinyin sequence search mixing lexicographic tree, with obtain with it is described edited
The matched newly-increased candidate text fragments of pinyin sequence;And searching module 604 is used for according to the candidate text fragments, increases newly
Candidate text fragments building and the matched candidate text set of the pinyin sequence to corrected text.
For example, when corrected text be " sport footwear when female ", the candidate text fragments obtained based on Forward Maximum Method result
For " Ms ", " when female " and " sport footwear ", the candidate text fragments obtained based on reversed maximum matching result be " Ms's movement ",
" shoes " and " tool " are " Lv Shi " by the newly-increased candidate text fragments that edit operation obtains, then the candidate text set obtained includes
Following candidate's text: " Ms's sport footwear ", " sport footwear when female ", " Lv Shi sport footwear " and " Ms moves tool ".
Specifically, editor module carries out edit operation to the pinyin sequence of candidate text fragments can include: in the candidate
In the case that text fragments include Chinese character, editor module carries out the edit operation of fuzzy phoneme to the phonetic of the Chinese character;Described
In the case that candidate text fragments include English words, editor module is inserted into the English words, is replaced, is exchanged and/or is deleted
The edit operation removed.
In embodiments of the present invention, the quantity for being capable of increasing candidate text by the way that editor module is arranged, improves text error correction
Coverage rate.
Determining module 605, for calculating the assessed value of the candidate text, and it is determining described wait entangle according to the assessed value
The error correction result of wrong text, may particularly include: determining module 605 calculates separately the candidate text based on multiple error correcting models
Evaluation factor;Determining module 605 merges multiple evaluation factors, to obtain the assessed value of the candidate text;Determine mould
Block 605 determines the error correction result to corrected text according to the assessed value.
Wherein, the multiple error correcting model may include following at least two: noisy communication channel error correcting model, editing distance error correction
Model, phonetic are apart from error correcting model.
In an optional example, the multiple error correcting model includes noisy communication channel error correcting model, editing distance error correcting model
With phonetic apart from error correcting model.In this example, determining module 605 is based on multiple error correcting models and calculates separately the candidate text
Evaluation factor include: operation one, operation two and operation three.
Operation one, determining module 605 calculate the noisy communication channel probability of the candidate text based on noisy communication channel error correcting model,
And as the first evaluation factor of the candidate text.
Specifically, determining module 605 can calculate the noisy communication channel probability of candidate text according to the following formula:
P=P (q/c) * P (c);
Wherein, P is the noisy communication channel probability of candidate text, and q indicates that, to corrected text, c indicates candidate text, P (q/c) table
Show candidate text and to the transition probability between corrected text, P (c) indicates the prior probability of candidate text.
Further, P (q/c), P (c) can be calculated according to the following formula:
Wherein, freq (c) indicates that frequency of occurrence of the candidate text c in training corpus, freq (q, c) are indicated to error correction term
The frequency occurred simultaneously in training corpus with candidate text, | C | indicate that the sum of all words in library is expected in training.
Operation two, determining module 605 calculate the editing distance of the candidate text, and root based on editing distance error correcting model
The second evaluation factor of the candidate text is determined according to editing distance.
Wherein, the editing distance of candidate text refers to;The editor behaviour of minimum needed for becoming candidate text to corrected text
Make number.Wherein, edit operation can be insertion, replacement, exchange, deletion.For example, be " by machine " to corrected text, candidate text
This is " mobile phone ", then the editing distance of candidate text is 1.For example, being " iphoe " to corrected text, candidate text is
" iphone ", then the editing distance of candidate text is 1.
Optionally it is determined that module 605 can calculate the second evaluation factor of the candidate text according to the following formula:
Wherein, μeditIndicate the second evaluation factor of candidate text, deditIndicate the editing distance of candidate text, max { L1,
L2Indicate to take maximum word length, L in corrected text and candidate text1Indicate the word length to corrected text, L2Table
Show the word length of candidate text.
Operation three, determining module 605 calculate the phonetic distance of the candidate text, and root based on phonetic apart from error correcting model
The third evaluation factor of the candidate text is determined according to phonetic distance.
Specifically, determining module 605 calculates the phonetic distance of the candidate text based on phonetic apart from error correcting model
Operation can include: treat the word in corrected text and candidate text, whether determining module 605 compares its phonetic composition letter one by one
Whether identical and tone is identical;Determining module 605 determines the phonetic distance of each word according to comparison result, and will be described each
Phonetic distance of the adduction of the phonetic distance of word as the candidate text.
Wherein, when including non-Chinese character part (such as English words, number) in corrected text and candidate text, it is believed that institute
State same position identical characters in non-Chinese character part phonetic composition letter it is identical, tone is also identical, it is believed that the non-Chinese character portion
The phonetic composition letter difference of same position kinds of characters, tone are also different in point.
For example, being " by machine " to corrected text, candidate text is " mobile phone ", the phonetic distance of first character are as follows: 1 (phonetic
It is identical)+0 (tone is different)=1;The phonetic distance of second word are as follows: 1 (phonetic is identical)+1 (tone is identical)=2.Therefore, it waits
The phonetic distance of selection sheet " mobile phone " is 3.
For example, be " ipd " to corrected text, candidate text be " ipad ", to first in corrected text and candidate's text
The phonetic composition letter of a word " i " is identical, and tone is identical, so the phonetic distance of first character is 2.To corrected text and time
The phonetic of second word " p " in selection sheet is identical, and the tone of the two is identical, so the phonetic distance of second word is 2.To
Third word " d " in corrected text forms alphabetical different, tone difference from the phonetic of the third word " a " in candidate text,
So triliteral phonetic distance is 0.It is sky to the 4th word in corrected text, the 4th word in candidate text is
" d ", the phonetic composition letter and tone of the two are all different, so the phonetic distance of the 4th word is 0, therefore candidate text
The phonetic distance of " ipad " is 4.
Optionally it is determined that module 605 can calculate the third evaluation factor of the candidate text according to the following formula:
Wherein, νpinyinIndicate the third evaluation factor of candidate text, dpinyinIndicate the phonetic distance of candidate text, max
{L1,L2Indicate to take maximum word length, L in corrected text and candidate text1Indicate the word length to corrected text,
L2Indicate the word length of candidate text.
Further, in this example, after obtaining the first, second and third evaluation factor, determining module 605 can by first,
Two, three evaluation factors are merged, to obtain the assessed value of the candidate text.After obtaining the assessed value, mould is determined
Block 605 can be using the maximum candidate text of assessed value as the error correction result to corrected text.Alternatively, determining module 605
Assessed value can also be greater than to the candidate text of one or more of a certain preset threshold as the error correction knot to corrected text
Fruit.
Optionally, determining module 605 can according to the following formula merge the first, second and third evaluation factor:
Score=a1*P+b1*μedit+c1*νpinyin;
Wherein, Score indicates the assessed value of candidate text, a1、b1、c1For preset constant coefficient, P is the first evaluation factor,
μeditFor the second evaluation factor, νpinyinFor third evaluation factor.
In embodiments of the present invention, determining module 605 is by being based on noisy communication channel error correcting model, editing distance error correction mould
Type, phonetic calculate separately the first, second and third evaluation factor apart from error correcting model, and merge to the first, second and third evaluation factor
It is operated with the assessed value etc. for obtaining candidate text, the accuracy rate of text error correction can be further increased.The dress of the embodiment of the present invention
The text error correction that can handle Chinese, English, pinyin mixing well is set, the coverage rate and applicability of text error correction are improved.
Fig. 7 is shown can be using the text error correction method of the embodiment of the present invention or the exemplary system of text error correction device
Framework 700.
As shown in fig. 7, system architecture 700 may include terminal device 701,702,703, network 704 and server 705.
Network 704 between terminal device 701,702,703 and server 705 to provide the medium of communication link.Network 704 can be with
Including various connection types, such as wired, wireless communication link or fiber optic cables etc..
User can be used terminal device 701,702,703 and be interacted by network 704 with server 705, to receive or send out
Send message etc..Various telecommunication customer end applications, such as the application of shopping class, net can be installed on terminal device 701,702,703
The application of page browsing device, searching class application, instant messaging tools, mailbox client, social platform software etc..
Terminal device 701,702,703 can be the various electronic equipments with display screen and supported web page browsing, packet
Include but be not limited to smart phone, tablet computer, pocket computer on knee and desktop computer etc..
Server 705 can be to provide the server of various services, such as utilize terminal device 701,702,703 to user
The shopping class website browsed provides the back-stage management server supported.Back-stage management server can be to the search term received
Etc. data carry out the inquiry processing such as error correction, and error correction result is fed back into terminal device.
It should be noted that text error correction method provided by the embodiment of the present invention is generally executed by server 705, accordingly
Ground, text error correction device are generally positioned in server 705.
It should be understood that the number of terminal device, network and server in Fig. 7 is only schematical.According to realization need
It wants, can have any number of terminal device, network and server.
Further, the present invention also provides a kind of electronic equipment, comprising: one or more processors;And storage dress
It sets, for storing one or more programs;When one or more of programs are executed by one or more of processors, so that
One or more of processors realize text error correction method of the invention.
Fig. 8 shows the structural schematic diagram for being suitable for the computer system 800 for being used to realize electronic equipment of the invention.Fig. 8
The computer system shown is only an example, should not function to the embodiment of the present invention and use scope bring any limit
System.
As shown in figure 8, computer system 800 includes central processing unit (CPU) 801, it can be read-only according to being stored in
Program in memory (ROM) 802 or be loaded into the program in random access storage device (RAM) 803 from storage section 808 and
Execute various movements appropriate and processing.In RAM 803, also it is stored with system 800 and operates required various programs and data.
CPU 801, ROM 802 and RAM 803 are connected with each other by bus 804.Input/output (I/O) interface 805 is also connected to always
Line 804.
I/O interface 805 is connected to lower component: the importation 806 including keyboard, mouse etc.;It is penetrated including such as cathode
The output par, c 807 of spool (CRT), liquid crystal display (LCD) etc. and loudspeaker etc.;Storage section 808 including hard disk etc.;
And the communications portion 809 of the network interface card including LAN card, modem etc..Communications portion 809 via such as because
The network of spy's net executes communication process.Driver 810 is also connected to I/O interface 805 as needed.Detachable media 811, such as
Disk, CD, magneto-optic disk, semiconductor memory etc. are mounted on as needed on driver 810, in order to read from thereon
Computer program be mounted into storage section 808 as needed.
Particularly, disclosed embodiment, the process described above with reference to flow chart may be implemented as counting according to the present invention
Calculation machine software program.For example, embodiment disclosed by the invention includes a kind of computer program product comprising be carried on computer
Computer program on readable medium, the computer program include the program code for method shown in execution flow chart.?
In such embodiment, which can be downloaded and installed from network by communications portion 809, and/or from can
Medium 811 is dismantled to be mounted.When the computer program is executed by central processing unit (CPU) Y01, system of the invention is executed
The above-mentioned function of middle restriction.
It should be noted that computer-readable medium shown in the present invention can be computer-readable signal media or meter
Calculation machine readable storage medium storing program for executing either the two any combination.Computer readable storage medium for example can be --- but not
Be limited to --- electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor system, device or device, or any above combination.Meter
The more specific example of calculation machine readable storage medium storing program for executing can include but is not limited to: have the electrical connection, just of one or more conducting wires
Taking formula computer disk, hard disk, random access storage device (RAM), read-only memory (ROM), erasable type may be programmed read-only storage
Device (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light storage device, magnetic memory device,
Or above-mentioned any appropriate combination.In the present invention, computer readable storage medium can be it is any include or storage journey
The tangible medium of sequence, the program can be commanded execution system, device or device use or in connection.And at this
In invention, computer-readable signal media may include in a base band or as carrier wave a part propagate data-signal,
Wherein carry computer-readable program code.The data-signal of this propagation can take various forms, including but unlimited
In electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be that computer can
Any computer-readable medium other than storage medium is read, which can send, propagates or transmit and be used for
By the use of instruction execution system, device or device or program in connection.Include on computer-readable medium
Program code can transmit with any suitable medium, including but not limited to: wireless, electric wire, optical cable, RF etc. are above-mentioned
Any appropriate combination.
Flow chart and block diagram in attached drawing are illustrated according to the system of various embodiments of the invention, method and computer journey
The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation
A part of one module, program segment or code of table, a part of above-mentioned module, program segment or code include one or more
Executable instruction for implementing the specified logical function.It should also be noted that in some implementations as replacements, institute in box
The function of mark can also occur in a different order than that indicated in the drawings.For example, two boxes succeedingly indicated are practical
On can be basically executed in parallel, they can also be executed in the opposite order sometimes, and this depends on the function involved.Also it wants
It is noted that the combination of each box in block diagram or flow chart and the box in block diagram or flow chart, can use and execute rule
The dedicated hardware based systems of fixed functions or operations is realized, or can use the group of specialized hardware and computer instruction
It closes to realize.
Being described in module involved in the embodiment of the present invention can be realized by way of software, can also be by hard
The mode of part is realized.Described module also can be set in the processor, for example, can be described as: a kind of processor packet
It includes and obtains module, searching module, determining module.Wherein, the title of these modules is not constituted under certain conditions to the module
The restriction of itself, for example, obtaining module is also described as " obtaining the module of the pinyin sequence to corrected text ".
As on the other hand, the present invention also provides a kind of computer-readable medium, which be can be
Included in equipment described in above-described embodiment;It is also possible to individualism, and without in the supplying equipment.Above-mentioned calculating
Machine readable medium carries one or more program, when said one or multiple programs are executed by the equipment, makes
It obtains the equipment and executes following below scheme: obtaining the pinyin sequence to corrected text;Mixing lexicographic tree is searched, to obtain with described wait entangle
The matched candidate text set of the pinyin sequence of wrong text;It is corresponding with Chinese word and English words that the mixing lexicographic tree includes phonetic
Relationship;The error correction result to corrected text is determined according to error correcting model and the candidate text set.
Above-mentioned specific embodiment, does not constitute a limitation on the scope of protection of the present invention.Those skilled in the art should be bright
It is white, design requirement and other factors are depended on, various modifications, combination, sub-portfolio and substitution can occur.It is any
Made modifications, equivalent substitutions and improvements etc. within the spirit and principles in the present invention, should be included in the scope of the present invention
Within.
Claims (16)
1. a kind of text error correction method, which is characterized in that the described method includes:
Obtain the pinyin sequence to corrected text;
Mixing lexicographic tree is searched, to obtain and the matched candidate text set of the pinyin sequence to corrected text;The mixing
Lexicographic tree includes the corresponding relationship of phonetic and Chinese word and English words;
The error correction result to corrected text is determined according to error correcting model and the candidate text set.
2. the method according to claim 1, wherein the step of pinyin sequence of the acquisition to corrected text, wraps
It includes:
If described be made of to corrected text Chinese character, using the phonetic of the Chinese character as the pinyin sequence to corrected text;If
It is described to be made of to corrected text non-Chinese character, then described non-Chinese character itself is used as to the pinyin sequence to corrected text;If described
It is made of, then makees the entirety being made of the phonetic of the Chinese character and the non-Chinese character itself Chinese character and non-Chinese character to corrected text
For the pinyin sequence to corrected text;Wherein, the non-Chinese character includes: number, English words and/or phonetic.
3. the method according to claim 1, wherein the lookup mixes lexicographic tree, to obtain with described wait entangle
The step of pinyin sequence of wrong text matched candidate text set includes:
Mixing lexicographic tree is searched based on Forward Maximum Method algorithm and reversed maximum matching algorithm, and according to Forward Maximum Method knot
Fruit and reversed maximum matching result determination and the matched candidate text set of the pinyin sequence.
4. the method according to claim 1, wherein described determine according to error correcting model with the candidate text set
The step of error correction result to corrected text includes:
The evaluation factor of each candidate text in the candidate text set is calculated separately based on multiple error correcting models;By multiple assessments
The factor is merged, to obtain the assessed value of the candidate text;The entangling to corrected text is determined according to the assessed value
Wrong result.
5. according to the method described in claim 4, it is characterized in that, the multiple error correcting model includes following at least two: making an uproar
Acoustic channel error correcting model, editing distance error correcting model, phonetic are apart from error correcting model.
6. according to the method described in claim 5, it is characterized in that, including noisy communication channel error correction mould in the multiple error correcting model
Type, editing distance error correcting model and phonetic are described to calculate separately institute based on multiple error correcting models in the case where the error correcting model
State in candidate text set it is each candidate text evaluation factor the step of include:
The noisy communication channel probability of the candidate text is calculated based on noisy communication channel error correcting model, and as the candidate text
The first evaluation factor;The editing distance of the candidate text is calculated based on editing distance error correcting model, and according to editing distance
Determine the second evaluation factor of the candidate text;Based on phonetic apart from error correcting model calculate the phonetic of the candidate text away from
From, and according to the third evaluation factor of the determining candidate text of phonetic distance.
7. according to the method described in claim 6, it is characterized in that, described calculate the candidate apart from error correcting model based on phonetic
The phonetic of text apart from the step of include:
Treat the word in corrected text and candidate text, compare one by one its phonetic composition letter whether identical and tone whether phase
Together;The phonetic distance of each word is determined according to comparison result, and using the adduction of the phonetic distance of each word as the time
The phonetic distance of selection sheet.
8. according to the method described in claim 3, it is characterized in that, the Forward Maximum Method result, the reversed maximum
It include: at least one candidate text fragments with result;
The method also includes: edit operation is carried out to the pinyin sequence of candidate text fragments;According to edited pinyin sequence
Mixing lexicographic tree is searched, with acquisition and the edited matched newly-increased candidate text fragments of pinyin sequence, and according to described
Candidate text fragments, newly-increased candidate text fragments building and the matched candidate text set of the pinyin sequence to corrected text.
9. according to the method described in claim 8, it is characterized in that, the pinyin sequence to candidate text fragments is edited
The step of operation includes:
In the case where the candidate text fragments include Chinese character, the edit operation of fuzzy phoneme is carried out to the phonetic of the Chinese character;
In the case where the candidate text fragments include English words, the English words are inserted into, are replaced, are exchanged and/or are deleted
Edit operation.
10. the method according to claim 1, wherein the method also includes:
The pinyin sequence of training sample word is obtained, and mixing lexicographic tree is constructed according to the pinyin sequence of the training sample word.
11. according to the method described in claim 10, it is characterized in that, the method also includes:
Mixing dictionary is constructed in the pinyin sequence for obtaining training sample word, and according to the pinyin sequence of the training sample word
Before the step of tree, source data is cleaned, to obtain the training sample word.
12. a kind of searching method, which is characterized in that the described method includes:
Receive input text;
In the case where determining the input text is to corrected text, the pinyin sequence of input text is obtained;
Mixing lexicographic tree is searched, to obtain and the matched candidate text set of the pinyin sequence of the input text;The blendword
Allusion quotation tree includes the corresponding relationship of phonetic and Chinese word and English words;
The error correction result of the input text is determined according to error correcting model and the candidate text set;
Search result is obtained according to the error correction result of the input text, and described search result is sent.
13. a kind of search error correction method, which is characterized in that the described method includes:
Receive input text;
In the case where determining the input text is to corrected text, the pinyin sequence of input text is obtained;
Mixing lexicographic tree is searched, to obtain and the matched candidate text set of the pinyin sequence of the input text;The blendword
Allusion quotation tree includes the corresponding relationship of phonetic and Chinese word and English words;
The error correction result of the input text is determined according to error correcting model and the candidate text set;
The error correction result of the input text is ranked up, and the error correction result after sequence is sent.
14. a kind of text error correction device, which is characterized in that described device includes:
Module is obtained, for obtaining the pinyin sequence to corrected text;
Searching module, for searching mixing lexicographic tree, to obtain and the matched candidate text of the pinyin sequence to corrected text
This collection;The mixing lexicographic tree includes the corresponding relationship of phonetic and Chinese word and English words;
Determining module, for determining the error correction result to corrected text according to error correcting model and the candidate text set.
15. a kind of electronic equipment characterized by comprising
One or more processors;
Storage device, for storing one or more programs,
When one or more of programs are executed by one or more of processors, so that one or more of processors are real
The now method as described in any in claim 1 to 11.
16. a kind of computer-readable medium, is stored thereon with computer program, which is characterized in that described program is held by processor
The method as described in any in claim 1 to 11 is realized when row.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810030108.3A CN110032722A (en) | 2018-01-12 | 2018-01-12 | Text error correction method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810030108.3A CN110032722A (en) | 2018-01-12 | 2018-01-12 | Text error correction method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110032722A true CN110032722A (en) | 2019-07-19 |
Family
ID=67234834
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810030108.3A Pending CN110032722A (en) | 2018-01-12 | 2018-01-12 | Text error correction method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110032722A (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111105787A (en) * | 2019-12-31 | 2020-05-05 | 苏州思必驰信息科技有限公司 | Text matching method and device and computer readable storage medium |
CN112232062A (en) * | 2020-12-11 | 2021-01-15 | 北京百度网讯科技有限公司 | Text error correction method and device, electronic equipment and storage medium |
CN112560493A (en) * | 2020-12-17 | 2021-03-26 | 金蝶软件(中国)有限公司 | Named entity error correction method, named entity error correction device, computer equipment and storage medium |
CN112560452A (en) * | 2021-02-25 | 2021-03-26 | 智者四海(北京)技术有限公司 | Method and system for automatically generating error correction corpus |
CN112863516A (en) * | 2020-12-31 | 2021-05-28 | 竹间智能科技(上海)有限公司 | Text error correction method and system and electronic equipment |
CN113032683A (en) * | 2021-04-28 | 2021-06-25 | 玉米社(深圳)网络科技有限公司 | Method for quickly segmenting words in network popularization |
CN113378553A (en) * | 2021-04-21 | 2021-09-10 | 广州博冠信息科技有限公司 | Text processing method and device, electronic equipment and storage medium |
CN114239559A (en) * | 2021-11-15 | 2022-03-25 | 北京百度网讯科技有限公司 | Method, apparatus, device and medium for generating text error correction and text error correction model |
CN115221866A (en) * | 2022-06-23 | 2022-10-21 | 平安科技(深圳)有限公司 | Method and system for correcting spelling of entity word |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103198149A (en) * | 2013-04-23 | 2013-07-10 | 中国科学院计算技术研究所 | Method and system for query error correction |
US20160179774A1 (en) * | 2014-12-18 | 2016-06-23 | International Business Machines Corporation | Orthographic Error Correction Using Phonetic Transcription |
CN105975625A (en) * | 2016-05-26 | 2016-09-28 | 同方知网数字出版技术股份有限公司 | Chinglish inquiring correcting method and system oriented to English search engine |
CN106708893A (en) * | 2015-11-17 | 2017-05-24 | 华为技术有限公司 | Error correction method and device for search query term |
CN107193921A (en) * | 2017-05-15 | 2017-09-22 | 中山大学 | The method and system of the Sino-British mixing inquiry error correction of Search Engine-Oriented |
-
2018
- 2018-01-12 CN CN201810030108.3A patent/CN110032722A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103198149A (en) * | 2013-04-23 | 2013-07-10 | 中国科学院计算技术研究所 | Method and system for query error correction |
US20160179774A1 (en) * | 2014-12-18 | 2016-06-23 | International Business Machines Corporation | Orthographic Error Correction Using Phonetic Transcription |
CN106708893A (en) * | 2015-11-17 | 2017-05-24 | 华为技术有限公司 | Error correction method and device for search query term |
CN105975625A (en) * | 2016-05-26 | 2016-09-28 | 同方知网数字出版技术股份有限公司 | Chinglish inquiring correcting method and system oriented to English search engine |
CN107193921A (en) * | 2017-05-15 | 2017-09-22 | 中山大学 | The method and system of the Sino-British mixing inquiry error correction of Search Engine-Oriented |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111105787A (en) * | 2019-12-31 | 2020-05-05 | 苏州思必驰信息科技有限公司 | Text matching method and device and computer readable storage medium |
CN112232062A (en) * | 2020-12-11 | 2021-01-15 | 北京百度网讯科技有限公司 | Text error correction method and device, electronic equipment and storage medium |
US11423222B2 (en) | 2020-12-11 | 2022-08-23 | Beijing Baidu Netcom Science Technology Co., Ltd. | Method and apparatus for text error correction, electronic device and storage medium |
CN112560493A (en) * | 2020-12-17 | 2021-03-26 | 金蝶软件(中国)有限公司 | Named entity error correction method, named entity error correction device, computer equipment and storage medium |
CN112863516A (en) * | 2020-12-31 | 2021-05-28 | 竹间智能科技(上海)有限公司 | Text error correction method and system and electronic equipment |
CN112560452A (en) * | 2021-02-25 | 2021-03-26 | 智者四海(北京)技术有限公司 | Method and system for automatically generating error correction corpus |
CN113378553A (en) * | 2021-04-21 | 2021-09-10 | 广州博冠信息科技有限公司 | Text processing method and device, electronic equipment and storage medium |
CN113032683A (en) * | 2021-04-28 | 2021-06-25 | 玉米社(深圳)网络科技有限公司 | Method for quickly segmenting words in network popularization |
CN114239559A (en) * | 2021-11-15 | 2022-03-25 | 北京百度网讯科技有限公司 | Method, apparatus, device and medium for generating text error correction and text error correction model |
CN114239559B (en) * | 2021-11-15 | 2023-07-11 | 北京百度网讯科技有限公司 | Text error correction and text error correction model generation method, device, equipment and medium |
CN115221866A (en) * | 2022-06-23 | 2022-10-21 | 平安科技(深圳)有限公司 | Method and system for correcting spelling of entity word |
CN115221866B (en) * | 2022-06-23 | 2023-07-18 | 平安科技(深圳)有限公司 | Entity word spelling error correction method and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110032722A (en) | Text error correction method and device | |
CN104156454B (en) | The error correction method and device of search term | |
CN105574092B (en) | Information mining method and device | |
CN109299458A (en) | Entity recognition method, device, equipment and storage medium | |
CN109271631A (en) | Segmenting method, device, equipment and storage medium | |
CN110162767A (en) | The method and apparatus of text error correction | |
JP6517352B2 (en) | Method and system for providing translation information | |
CN108628830B (en) | Semantic recognition method and device | |
CN108768840A (en) | A kind of method and apparatus of account management | |
CN102750280A (en) | Computer processing method and system for search | |
US20160092421A1 (en) | Text Editing Method and Apparatus, and Server | |
CN109992766B (en) | Method and device for extracting target words | |
CN104462051A (en) | Word segmentation method and device | |
US20210042470A1 (en) | Method and device for separating words | |
CN110069698A (en) | Information-pushing method and device | |
CN107943895A (en) | Information-pushing method and device | |
CN103514230A (en) | Method and device used for training language model according to corpus sequence | |
CN106681598A (en) | Information input method and device | |
CN110276065A (en) | A kind of method and apparatus handling goods review | |
CN110874396A (en) | Keyword extraction method and device and computer storage medium | |
CN111861596A (en) | Text classification method and device | |
KR101931624B1 (en) | Trend Analyzing Method for Fassion Field and Storage Medium Having the Same | |
CN110309293A (en) | Text recommended method and device | |
CN105929979B (en) | Long sentence input method and device | |
CN111538830A (en) | French retrieval method, French retrieval device, computer equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |