A kind of correcting method, device, terminal and storage medium for searching for text
Technical field
The present embodiments relate to the information processing technology more particularly to a kind of correcting method, devices, terminal for searching for text
And storage medium.
Background technique
With the fast development of Information technology, user can carry out information inquiry by way of search, to obtain required letter
Breath.In general, user can find out from information aggregate and search text according to self-demand search text by gopher
This corresponding search result.For example, user can input main broadcaster's title in search entrance in network direct broadcasting platform, thus
The live video for wanting viewing can be quickly found out.
In general, mistake often occurs in search text for user, for example misspelling, word are reverse etc., so that
User can not find required search result, it is therefore desirable to carry out correction processing to the search text of user's input.
Existing correction procedure is: after segmenting to search text, correction processing directly is carried out to each search term, and
And need to consider the context of each word after correction.However, the participle for searching for text in the prior art is not often accurate, and
When the relevance of search term is poor, if still considering the context of word after correcting, there may be ambiguity information, often can not
Determine that real accurate text affects the search experience of user to reduce the accuracy of correction.
Summary of the invention
The embodiment of the invention provides a kind of correcting method, device, terminal and storage mediums for searching for text, existing to solve
Have in technology because search term relevance it is poor caused by correct the lower problem of accuracy, to improve correction accuracy, in turn
Promote the search experience of user.
In a first aspect, the embodiment of the invention provides a kind of correcting methods for searching for text, comprising:
Target search text is obtained, and word segmentation processing is carried out to the target search text and determines the target search text
Corresponding search word sequence;
The corresponding each accurate word sequence of candidate of described search word sequence is determined according to search corpus, wherein described search
The candidate accurate word in search term and the candidate accurate word sequence in word sequence corresponds;
Described search word sequence is segmented, determines the corresponding each search term segment of each segmented mode and each described
The corresponding each candidate accurate word segment of search term segment, wherein the segmented mode includes search term number of fragments and each
The corresponding search term quantity of search term segment;
According to the corresponding each search term segment of each segmented mode, corresponding with described search word segment each candidate quasi-
True word segment, described search corpus and search term sum determine each candidate accurate word under each segmented mode
The corresponding candidate corrections probability of sequence;
According to the candidate accurate corresponding candidate corrections probability of word sequence each under each segmented mode, determine described in
The accurate word sequence of the corresponding target of target search text, and the target search text is determined according to the accurate word sequence of the target
The corresponding accurate text of target.
Second aspect, the embodiment of the invention also provides a kind of correcting devices for searching for text, comprising:
Word sequence determining module is searched for, is segmented for obtaining target search text, and to the target search text
Processing determines the corresponding search word sequence of the target search text;
Candidate accurate word sequence determining module, for determining the corresponding each time of described search word sequence according to search corpus
Make an accurate selection of true word sequence, wherein the candidate accurate word in search term and the candidate accurate word sequence in described search word sequence
It corresponds;
Search term sequence segment module determines that each segmented mode is corresponding for being segmented to described search word sequence
Each search term segment and the corresponding each candidate accurate word segment of each described search word segment, wherein the segmented mode packet
Include search term number of fragments and the corresponding search term quantity of each search term segment;
Candidate corrections probability determination module, for according to the corresponding each search term segment of each segmented mode and institute
The corresponding each candidate accurate word segment of search term segment, described search corpus and search term sum are stated, is determined each described
The corresponding candidate corrections probability of each candidate accurate word sequence under segmented mode;
The accurate text determining module of target, for according to the candidate accurate word sequence pair each under each segmented mode
The candidate corrections probability answered determines the corresponding accurate word sequence of target of the target search text, and accurate according to the target
Word sequence determines the corresponding accurate text of target of the target search text.
The third aspect, the embodiment of the invention also provides a kind of terminal, the terminal includes:
One or more processors;
Memory, for storing one or more programs;
When one or more of programs are executed by one or more of processors, so that one or more of processing
Device realizes the correcting method of the search text as described in any embodiment of that present invention.
Fourth aspect, the embodiment of the invention also provides a kind of computer readable storage mediums, are stored thereon with computer
Program realizes the correcting method of the search text as described in any embodiment of that present invention when the program is executed by processor.
The embodiment of the present invention determines the corresponding each search term piece of each segmented mode by being segmented to search word sequence
Section and the corresponding each candidate accurate word segment of each search term segment;And according to the corresponding each search term piece of each segmented mode
Section, each candidate accurate word segment corresponding with search term segment, search corpus and search term sum, determine each segmentation
The corresponding candidate corrections probability of each candidate accurate word sequence under mode;According to candidate accurate word sequence pair each under each segmented mode
The candidate corrections probability answered determines the accurate word sequence of the corresponding target of target search text, and true according to the accurate word sequence of target
It sets the goal and searches for the accurate text of the corresponding target of text.By carrying out segmentation error correction to search word sequence, each segmentation side is considered
The relevance of search term under formula may thereby determine that the corresponding optimal accurate word sequence of candidate of optimal segmented mode, avoid
Lead to the situation of the accurate word sequence inaccuracy of target calculated because the relevance of search term is lower, improves the accurate of correction
Property, and then improve the search experience of user.
Detailed description of the invention
Fig. 1 is a kind of flow chart of the correcting method for search text that the embodiment of the present invention one provides;
Fig. 2 is a kind of flow chart of correcting method for searching for text provided by Embodiment 2 of the present invention;
Fig. 3 is a kind of structural schematic diagram of the correcting device for search text that the embodiment of the present invention three provides;
Fig. 4 is a kind of structural schematic diagram for terminal that the embodiment of the present invention four provides.
Specific embodiment
The present invention is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched
The specific embodiment stated is used only for explaining the present invention rather than limiting the invention.It also should be noted that in order to just
Only the parts related to the present invention are shown in description, attached drawing rather than entire infrastructure.
Embodiment one
Fig. 1 is a kind of flow chart of the correcting method for search text that the embodiment of the present invention one provides, and the present embodiment can fit
The case where for carrying out spelling error correction to search text, it is particularly possible to for correcting the field of search text in network direct broadcasting platform
Scape.This method can be executed by the correcting device of search text, which can be by the mode of software and/or hardware Lai real
It is existing, it is integrated in the terminal with function of search, such as smart phone, tablet computer, desktop computer etc..This method is specifically wrapped
Include following steps:
S110, target search text is obtained, and word segmentation processing is carried out to target search text and determines target search text pair
The search word sequence answered.
Wherein, target search text refers to user's search text currently entered.Illustratively, current search can be entered
Search text in mouthful is determined as target search text.Word segmentation processing can refer to according to dictionary for word segmentation or other participle rule
Then, target search text is divided into multiple search terms.Search word sequence refers to after target search text progress word segmentation processing
The sequence of obtained each search term composition.Search term in search word sequence puts in order and the search in target search text
Word order is consistent.Illustratively, if target search text is " not soul-stirring ", after carrying out word match according to dictionary for word segmentation,
Determining search word sequence can be with are as follows: ", not soul-stirring ".
S120, the corresponding each accurate word sequence of candidate of search word sequence is determined according to search corpus, wherein search word order
The candidate accurate word in search term and candidate accurate word sequence in column corresponds.
Wherein, search corpus can be predefined according to the search behavior log of a large number of users.Searching for corpus can be with
Accurate keyword after being repaired including a large amount of historical search keyword and each historical search keyword, wherein accurate close
Keyword can be determined according to the clicking operation of user.Candidate accurate word sequence refers to that target search text is corresponding any
Possible correct sequence.The present embodiment can determine optimal accurate word sequence from all candidate accurate word sequences.In this reality
It applies in example, each candidate corresponding candidate accurate word text of accurate word sequence, wherein the candidate in candidate accurate word text
Accurate word order is consistent with candidate accurate word sequence.The candidate searched in the search term in word sequence and candidate accurate word sequence is quasi-
True word corresponds, i.e., a candidate accurate word in search word sequence in the corresponding candidate accurate word sequence of each search term.Show
Example property, the corresponding search word sequence of target search word text is q1,q2,...,qN, some candidate accurate word sequence is c1,
c2,...,cN, wherein search term qiWith candidate accurate word ciIt corresponds.
Specifically, the present embodiment can determine that each search term in search word sequence is corresponding extremely according to search corpus
A few candidate accurate word, and candidate accurate word is subjected to permutation and combination, determine that search word sequence is corresponding each candidate accurate
Word sequence.Illustratively, if search word sequence is ", not soul-stirring ", determine that search term " not " is corresponding according to search corpus
Candidate accurate word are as follows: " not ", " step by step " and " portion, portion ", the corresponding candidate accurate word of search term " soul-stirring " are as follows: " soul-stirring " and
" meticulous ", it is determined that each accurate word sequence of candidate are as follows: " not, meticulously ", " step by step, soul-stirring ", " step by step, meticulously ", " portion, portion,
It is soul-stirring " and " portion, portion, meticulously ".It should be noted that candidate accurate word may be search term itself, this is because target search
Some search term in text is also likely to be the accurate search term of spelling.
S130, search word sequence is segmented, determines the corresponding each search term segment of each segmented mode and respectively searches
The corresponding each candidate accurate word segment of rope word segment.
Wherein, search term segment includes at least one search term.Search term segment can be one in search word sequence
Point, it is also possible to entirely search for word sequence, concrete condition is determined by segmented mode.Segmented mode may include search term segments
Amount and the corresponding search term quantity of each search term segment.The value range of search term number of fragments in segmented mode are as follows:
More than or equal to 1 and it is less than or equal to the corresponding search term sum of search word sequence.Segmented mode in the present embodiment is searched for characterizing
It includes several search terms that rope word sequence, which is divided into several search term segments and each search term segment,.The number of segmented mode can
To be determined according to the corresponding search term sum of search word sequence.It is candidate accurate for each candidate accurate word sequence
Each of word sequence candidate's accurate word segment and search term segment correspond, and the quantity of candidate accurate word segment with search
The quantity of rope word segment is identical and the candidate accurate word quantity of each candidate accurate word segment and corresponding search term segment
Search term quantity is also identical.
Illustratively, it is assumed that search word sequence is q1,q2,q3, some candidate accurate word sequence is c1,c2,c3, then have four
Kind segmented mode, the first segmented mode are as follows: search word sequence is only divided into a search term segment, i.e. q1,q2,q3, right at this time
The candidate accurate word segment answered is c1,c2,c3;Second of segmented mode are as follows: search word sequence is divided into two search term segments,
First search term segment includes two search terms, and second search term segment includes a search term, i.e. q1,q2And q3, at this time
Corresponding candidate's accurate word segment is respectively c1,c2And c3;The third segmented mode are as follows: search word sequence is divided into two search
Word segment, first search term segment include a search term, and second search term segment includes two search terms, i.e. q1And q2,
q3, candidate accurate word segment corresponding at this time is respectively c1And c2,c3;4th kind of segmented mode are as follows: search word sequence is divided into three
A search term segment, each search term is as a search term segment, i.e. q1、q2And q3, candidate accurate word segment corresponding at this time
Respectively c1、c2And c3。
Specifically, each segmented mode corresponds at least one search term segment, each search term segment corresponds to each
A candidate accurate word segment in candidate accurate word sequence.
S140, according to the corresponding each search term segment of each segmented mode, corresponding with search term segment each candidate accurate
Word segment, search corpus and search term sum, determine the corresponding candidate of each candidate accurate word sequence under each segmented mode
Correct probability.
Wherein, search term sum refers to the quantity of the search term of composition search word sequence.Candidate accurate word sequence is corresponding
Candidate corrections probability refers to that search word sequence is corrected as the correction probability of the accurate word sequence of the candidate.
Specifically, since each segmented mode of search word sequence is different, to be calculated under each segmented mode
The accurate corresponding candidate corrections probability of word sequence of candidate it is also different.For some candidate accurate word sequence under some segmented mode
For, according to search corpus, the corresponding each search term segment of the segmented mode, candidate accurately in word sequence with each search term
The corresponding each candidate accurate word segment of segment and search term sum determine that the corresponding candidate corrections of the accurate word sequence of the candidate are general
Rate.The present embodiment, which passes through, calculates the corresponding candidate corrections probability of each candidate accurate word sequence under each segmented mode, so as to
To obtain considering the context of at least one candidate accurate word or not considering all of whole contexts of candidate accurate words
In the case of corresponding candidate corrections probability, with solve because the relevance of search term is lower or search corpus in historical data
It is less and lead to correct the low problem of accuracy, improve the accuracy of correction.
Illustratively, if segmented mode are as follows: using all search terms as a search term segment, i.e. search word sequence only wraps
Include a search term segment, that is, corresponding the case where not being segmented, then each accurate word order of candidate that this kind of segmented mode determines
Corresponding candidate corrections probability is arranged to be required to consider the context relation of each candidate accurate word.If segmented mode are as follows: each search
Rope word is used as a search term segment, the i.e. situation of search term number of fragments maximum, then this kind of segmented mode determines each
The candidate accurate corresponding candidate corrections probability of word sequence does not take into account that the context relation of candidate accurate word.
S150, according to the candidate accurate corresponding candidate corrections probability of word sequence each under each segmented mode, determine that target is searched
The accurate word sequence of the corresponding target of Suo Wenben, and determine that the corresponding target of target search text is accurate according to the accurate word sequence of target
Text.
Wherein, the accurate word sequence of target, which can be, refers to the corresponding optimal accurate word order of candidate under optimal segmented mode
Column.For each segmented mode, the corresponding candidate corrections probability of each candidate accurate word sequence can be obtained.In other words,
For each candidate accurate word sequence, it is also possible to obtain accurate candidate of the word sequence under each segmented mode of the candidate entangles
Positive probability.In the present embodiment, it can include but is not limited to following two for determining that the corresponding target of target search text is quasi-
The mode of true word sequence.
Optionally, according to the corresponding candidate corrections probability of each candidate accurate word sequence under each segmented mode, target is determined
Search for the corresponding accurate word sequence of target of text, comprising: according to the corresponding time of each candidate accurate word sequence under each segmented mode
Choosing correct probability, determine the corresponding accurate word sequence to be selected of each segmented mode and each accurate word sequence to be selected it is corresponding to
Probability is corrected in choosing;The corresponding accurate word sequence to be selected of maximum correction probability to be selected is determined as the corresponding mesh of target search text
The true word sequence of standard.Specifically, for some segmented mode, candidate accurate word sequence each under the segmented mode is corresponding
Candidate corrections probability be compared, the corresponding candidate accurate word sequence of maximum candidate corrections probability is determined into the segmented mode
Corresponding accurate word sequence to be selected, and be that the accurate word sequence to be selected is corresponding by maximum candidate corrections determine the probability and to be selected entangle
Positive probability, so as to obtain the corresponding accurate word sequence to be selected of each segmented mode, then by each accurate word sequence to be selected
Corresponding correction probability to be selected is compared, and the corresponding accurate word sequence to be selected of maximum correction probability to be selected is determined as target
The corresponding accurate word sequence of target of text is searched for, may thereby determine that more accurate accurate word sequence.
Optionally, according to the corresponding candidate corrections probability of each candidate accurate word sequence under each segmented mode, target is determined
Search for the corresponding accurate word sequence of target of text, comprising: according to the corresponding time of each candidate accurate word sequence under each segmented mode
Probability is corrected in choosing, determines that the corresponding target of each candidate accurately word sequence corrects probability;It is corresponding that maximum target is corrected into probability
The accurate word sequence of candidate be determined as the accurate word sequence of the corresponding target of target search text.Specifically, candidate quasi- for some
For true word sequence, the candidate corrections probability that the accurate word sequence of the candidate determines under each segmented mode is compared, it will
Maximum candidate corrections determine the probability is that the corresponding target of the accurate word sequence of the candidate corrects probability, so as to obtain each time
It makes an accurate selection of the corresponding target of true word sequence and corrects probability, the corresponding target of each candidate accurately word sequence is then corrected into probability and is carried out
Compare, it is accurate that the corresponding candidate accurate word sequence of maximum target correction probability is determined as the corresponding target of target search text
Word sequence may thereby determine that more accurate accurate word sequence.
The present embodiment can press each accurate word in the accurate word sequence of target after determining the accurate word sequence of target
Direct splicing is carried out according to the sequence of the accurate word sequence of target, and guarantees the accurate word order in the accurate word sequence of target and determination
Accurate word sequence consensus in the accurate text of target, so as to obtain the accurate text of the corresponding target of target search text, i.e.,
Optimal accurate text.Target search text can be corrected as the accurate text of target by the present embodiment, automatically according to target standard
Accurate search result needed for true text scans for obtain user improves searching accuracy, and without user's weight
It is newly manually entered accurately search text, to improve the search experience of user.
The embodiment of the present invention determines the corresponding each search term piece of each segmented mode by being segmented to search word sequence
Section and the corresponding each candidate accurate word segment of each search term segment;And according to the corresponding each search term piece of each segmented mode
Section, each candidate accurate word segment corresponding with search term segment, search corpus and search term sum, determine each segmentation
The corresponding candidate corrections probability of each candidate accurate word sequence under mode;According to candidate accurate word sequence pair each under each segmented mode
The candidate corrections probability answered determines the accurate word sequence of the corresponding target of target search text, and true according to the accurate word sequence of target
It sets the goal and searches for the accurate text of the corresponding target of text.By carrying out segmentation error correction to search word sequence, each segmentation side is considered
The relevance of search term under formula may thereby determine that the corresponding optimal accurate word sequence of candidate of optimal segmented mode, avoid
Lead to the situation of the accurate word sequence inaccuracy of target calculated because the relevance of search term is lower, improves the accurate of correction
Property, and then improve the search experience of user.
Embodiment two
Fig. 2 is a kind of flow chart of correcting method for searching for text provided by Embodiment 2 of the present invention, and the present embodiment is upper
On the basis of stating embodiment one, to " according to the corresponding each search term segment of each segmented mode, corresponding with search term segment each
Candidate accurate word segment, search corpus, Wilson's confidence interval formula and search term sum, determine under each segmented mode
The corresponding candidate corrections probability of each candidate accurate word sequence " is advanced optimized.It is wherein identical as above-described embodiment one or
Details are not described herein for the explanation of corresponding term.
Referring to fig. 2, it is provided in this embodiment search text correcting method specifically includes the following steps:
S210, target search text is obtained, and word segmentation processing is carried out to target search text and determines target search text pair
The search word sequence answered.
S220, the corresponding each accurate word sequence of candidate of search word sequence is determined according to search corpus, wherein search word order
The candidate accurate word in search term and candidate accurate word sequence in column corresponds.
S230, search word sequence is segmented, determines the corresponding each search term segment of each segmented mode and respectively searches
The corresponding each candidate accurate word segment of rope word segment.
S240, each segmented mode is determined as to the first segmented mode one by one, and will be each candidate quasi- under the first segmented mode
True word sequence is determined as the first accurate word sequence one by one.
Specifically, the present embodiment is by being determined as the first segmented mode for each segmented mode one by one, and by the first segmentation
Candidate accurate word sequence is determined as the first accurate word sequence one by one each of under mode, each candidate quasi- in candidate accurate word sequence
True word segment is determined as each candidate in each first accurate word segment in the first accurate word sequence, and candidate accurate word segment
Accurate word is determined as each first accurate word in the first accurate word segment, to determine one by one each according to identical determination process
The corresponding candidate corrections probability of each accurate word sequence under segmented mode.
S250, according to search corpus, determine that the corresponding each segment of the first segmented mode corrects probability.
Wherein, the first segmented mode corresponds at least one search term segment.It is that search term segment is corrected that segment, which corrects probability,
For the probability of corresponding first accurate word segment.The present embodiment can be according to the search in search corpus and search term segment
Word quantity determines that each search term segment is corrected as the probability of corresponding first accurate word segment, i.e., each search term segment pair
The segment answered corrects probability.
Optionally, S250 includes: that the corresponding each search term segment of the first segmented mode is determined as target search word one by one
Segment;If target search word segment only includes a search term, determine that target search word segment is corresponding according to search corpus
First correct probability, and correcting determine the probability for first is that the corresponding segment of target search word segment corrects probability;If target
Search term segment includes at least two search terms, then determines that target search word segment corresponding each second is entangled according to search corpus
Positive probability and each third correct probability, and correct probability and each third correction probability according to each second, determine target search word
The corresponding segment of segment corrects probability.
Wherein, the second correction probability is that the current search word marked in search term segment is corrected as corresponding current first accurately
The probability of word.It is the probability of next first accurate word occur after current first accurate word that third, which corrects probability,.When target search word
When segment includes at least two search term, each search term in target search word segment is determined as current search word one by one.
Current first accurate word refers to corresponding first accurate word of current search word in the first accurate word segment.Latter first accurate word
Refer to the sequence according to the first accurate word sequence, after current first accurate word and the first adjacent accurate word.
Specifically, the corresponding each search term segment of the first segmented mode is determined as target search word segment one by one, from
And it can determine the corresponding segment of each search term segment one by one by identical process and correct probability.If target search word piece
Section only includes a search term, i.e., corresponding first accurate word segment also only includes first accurate word, then corrects first
Probability is determined directly as the corresponding segment of target search word segment and corrects probability.If target search word segment is searched including at least two
Rope word, i.e., corresponding first accurate word segment also includes at least two first accurate words, then according to each second correction probability and respectively
Third corrects the corresponding segment of the determine the probability target search word segment and corrects probability.It should be noted that working as target search word
When segment only includes a search term, without considering the context relation of corresponding first accurate word of the search term, and work as target
When search term segment includes at least two search term, need to consider the context of each first accurate word in the first accurate word segment
Relationship, i.e. third correct probability.
Illustratively, if target search word segment only includes a search term q1, corresponding first accurate word segment is also only
Including a first accurate word c1, then search term q is determined according to search corpus1It is corrected as corresponding first accurate word c1's
First corrects probability as p (c1|q1), it at this time can be by p (c1|q1) it is determined directly as the corresponding segment of target search word segment
Correct probability.If target search word segment includes two search term q1And q2, corresponding first accurate word segment also includes two
One accurate word c1And c2, then determine that each search term is corrected as the probability of corresponding first accurate word according to search corpus, i.e., two
A second corrects probability are as follows: p (c1|q1) and p (c2|q2) and next first accurate word occur after current first accurate word
Probability, i.e. a third correct probability are as follows: p (c2|c1), at this time according to p (c1|q1)、p(c2|q2) and p (c2|c1) determine the target
The corresponding segment of search term segment corrects probability.
Optionally, determine that target search word segment corresponding each second corrects probability and each third according to search corpus
Correct probability, comprising: according to search corpus determine the corresponding historical search number of current search word in target search word segment,
Current search word is corrected as corresponding current first accurate word of the historical correction number of corresponding first accurate word, current search word
The first frequency of occurrence and current first accurate word next first accurate word the second frequency of occurrence;According to historical search
Number and historical correction number determine that each second corrects probability;According to the first frequency of occurrence and the second frequency of occurrence, each is determined
Three correct probability.
Wherein, the corresponding historical search number of current search word refers to current search word in the historical search of search corpus
The number occurred in data.The historical correction number that current search word is corrected as corresponding first accurate word refers in search corpus
In the historical search data in library, current search word inaccuracy is corrected as the number of the first accurate word.Current search word is corresponding
First frequency of occurrence of current first accurate word refers to the number that current first accurate word occurs in search corpus.Current
Second frequency of occurrence of next first accurate word of one accurate word refers to that next first accurate word of current first accurate word is being searched
The number occurred in rope corpus.
Specifically, the present embodiment can be by historical correction when target search word segment includes at least two search term
Number is determined as current search word corresponding second with the ratio of historical search number and corrects probability, by the second frequency of occurrence and first
The ratio of frequency of occurrence is determined as third and corrects probability.
The present embodiment, can also be by the search term in search corpus when target search word segment only includes a search term
It is general to be determined as the first correction divided by the historical search number of the search term by the historical correction number in library for obtained operation result
Rate, the i.e. search term are corrected as the probability of corresponding first accurate word.
Optionally, in the present embodiment, it can be assumed that search term segment is corrected as the piece of corresponding first accurate word segment
Section corrects probability and meets hidden Markov models, easily determines that segment corrects probability so as to more.Illustratively, segment
Correct probabilityCalculation formula can simplify it is as follows:
That is:
Wherein,It is target search word segment;It is that target search word segment is corresponding first accurate
Word segment;ni+ 1 is that the first accurate word is corresponding in first search term or the first accurate word segment in target search word segment
Subscript;ni+1It is that the last one in the last one search term or the first accurate word segment is first accurate in target search word segment
The corresponding subscript of word;It is that the corresponding segment of target search word segment corrects probability;p(cj|qj)
It is j-th second correction probability, i.e., j-th of search term is corrected as the general of corresponding first accurate word in target search word segment
Rate, p (cj+1|cj) it is that+1 third of jth corrects probability, i.e. occur the in the first accurate word segment after j-th of first accurate words
The probability of j+1 the first accurate words.
Specifically, if target search word segment include at least two search terms, can will it is each second correct probability with
Each third corrects probability and is multiplied, and multiplied result is determined as the corresponding segment of the target search word segment and is corrected generally
Rate.
S260, the first segmented mode is determined according to each segment correction probability, search term number of fragments and search term sum
The corresponding candidate corrections probability of lower first accurate word sequence.
Optionally, the corresponding candidate corrections probability of the first accurate word sequence is determined according to the following formula:
S.t.0=n1< n2< ... < nk=N
Wherein, p (c1,c2,...,cN|q1,q2,...,qN) it is the corresponding candidate corrections probability of the first accurate word sequence;c1,
c2,...,cNIt is the first accurate word sequence;q1,q2,...,qNIt is the corresponding search word sequence of target search text;N is search term
Sum;K is the corresponding search term number of fragments of the first segmented mode;niIt is that the last one in (i-1)-th search term segment is searched for
The subscript of word;It is i-th of search term segment;It is corresponding first accurate word of i-th of search term segment
Segment;It is that the corresponding segment of i-th of search term segment corrects probability.
Illustratively, it is assumed that search term sum is 3, i.e. N is equal to 3, and the first accurate word sequence is c1,c2,c3, target search
The corresponding search word sequence of text is q1,q2,q3, the first segmented mode are as follows: and search word sequence is divided into two search term segments,
First search term segment includes two search terms, and second search term segment includes a search term, then the first segmented mode
Corresponding two search term segments, respectively q1,q2And q3, corresponding first accurate word segment is respectively c1,c2And c3, the first segmentation
The corresponding search term number of fragments k of mode is 2, the subscript n of the last one search term in first search term segment2=2, second
The subscript n of the last one search term in a search term segment3=3, the first accurate word sequence c at this time1,c2,c3Corresponding candidate entangles
Positive probability are as follows:
In the present embodiment, it by repeating step S240-S260, each of can determine under each segmented mode candidate quasi-
The corresponding candidate corrections probability of true word sequence.
S270, according to the candidate accurate corresponding candidate corrections probability of word sequence each under each segmented mode, determine that target is searched
The accurate word sequence of the corresponding target of Suo Wenben, and determine that the corresponding target of target search text is accurate according to the accurate word sequence of target
Text.
The technical solution of the present embodiment, by being corrected according to the corresponding each segment of search term segment each under each segmented mode
Probability can accurately determine the corresponding candidate corrections probability of each candidate accurate word sequence under each segmented mode, so as to more
Add and accurately determine the accurate word sequence after correcting, improves the accuracy of correction.
It is a kind of embodiment of correcting device for searching for text provided in an embodiment of the present invention below, the device and above-mentioned each
A kind of correcting method of search text of embodiment belongs to the same inventive concept, in a kind of reality of correcting device for searching for text
The detail content of not detailed description in example is applied, it can be with reference to a kind of above-mentioned embodiment for the correcting method for searching for text.
Embodiment three
Fig. 3 is a kind of structural schematic diagram of the correcting device for search text that the embodiment of the present invention three provides, the present embodiment
Be applicable to the case where carrying out spelling error correction to search text, which specifically includes: search word sequence determining module 310 is waited
It is quasi- to make an accurate selection of true word sequence determining module 320, search term sequence segment module 330, candidate corrections probability determination module 340 and target
True text determining module 350.
Wherein, word sequence determining module 310 is searched for, is carried out for obtaining target search text, and to target search text
Word segmentation processing determines the corresponding search word sequence of target search text;Candidate accurate word sequence determining module 320, is searched for basis
Rope corpus determines the corresponding each accurate word sequence of candidate of search word sequence, wherein search term and candidate in search word sequence
Candidate accurate word in accurate word sequence corresponds;Search term sequence segment module 330, for dividing search word sequence
Section determines the corresponding each search term segment of each segmented mode and the corresponding each candidate accurate word segment of each search term segment,
Wherein, segmented mode includes search term number of fragments and the corresponding search term quantity of each search term segment;Candidate corrections are general
Rate determining module 340, for according to the corresponding each search term segment of each segmented mode, each candidate corresponding with search term segment
Accurate word segment, search corpus and search term sum determine that each candidate accurate word sequence is corresponding under each segmented mode
Candidate corrections probability;The accurate text determining module 350 of target, for according to candidate accurate word sequence pair each under each segmented mode
The candidate corrections probability answered determines the accurate word sequence of the corresponding target of target search text, and true according to the accurate word sequence of target
It sets the goal and searches for the accurate text of the corresponding target of text.
The embodiment of the present invention considers the pass of search term under each segmented mode by carrying out segmentation error correction to search word sequence
Connection property may thereby determine that the corresponding optimal accurate word sequence of candidate of optimal segmented mode, avoid the pass because of search term
Connection property is lower and leads to the situation of the accurate word sequence inaccuracy of target calculated, improves the accuracy of correction, and then improve
The search experience of user.
Optionally, candidate corrections probability determination module 340, comprising:
First accurate word sequence determination unit, for each segmented mode to be determined as the first segmented mode one by one, and will
Each candidate accurate word sequence is determined as the first accurate word sequence one by one under first segmented mode;
Segment corrects probability determining unit, for determining the corresponding each segment of the first segmented mode according to search corpus
Correct probability, wherein it is the probability that search term segment is corrected as corresponding first accurate word segment that segment, which corrects probability,;
Candidate corrections probability determining unit, for correcting probability, search term number of fragments and search term according to each segment
Sum determines the corresponding candidate corrections probability of the first accurate word sequence under the first segmented mode.
Optionally, segment corrects probability determining unit, comprising:
Target search word segment determines subelement, for determining the corresponding each search term segment of the first segmented mode one by one
For target search word segment;
First segment corrects determine the probability subelement, if only including a search term, root for target search word segment
Determine that the search term corresponding first corrects probability in target search word segment according to search corpus, and the first correction probability is true
It is set to the corresponding segment of target search word segment and corrects probability, wherein the first correction probability is that the search term is corrected as corresponding the
The probability of one accurate word;
Second segment corrects determine the probability subelement, if including at least two search terms for target search word segment,
Determine that target search word segment corresponding each second corrects probability and each third and corrects probability according to search corpus, and according to
Each second corrects probability and each third correction probability, determines that the corresponding segment of target search word segment corrects probability, wherein second
Correcting probability is the probability that the current search word in target search word segment is corrected as corresponding current first accurate word, and third is entangled
Positive probability is that the probability of next first accurate word occur after current first accurate word.
Optionally, the second segment corrects determine the probability subelement, is also used to: determining target search word according to search corpus
The corresponding historical search number of current search word, current search word are corrected as the historical correction of corresponding first accurate word in segment
Next the of first frequency of occurrence of corresponding current first accurate word of number, current search word and current first accurate word
Second frequency of occurrence of one accurate word;According to historical search number and historical correction number, determine that each second corrects probability;According to
First frequency of occurrence and the second frequency of occurrence determine that each third corrects probability.
Optionally, determine that the corresponding segment of target search word segment corrects probability according to the following formula:
Wherein,It is target search word segment;It is that target search word segment is corresponding first accurate
Word segment;ni+ 1 is that the first accurate word is corresponding in first search term or the first accurate word segment in target search word segment
Subscript;ni+1It is that the last one in the last one search term or the first accurate word segment is first accurate in target search word segment
The corresponding subscript of word;It is that the corresponding segment of target search word segment corrects probability;p(cj|qj)
It is j-th second correction probability, i.e., j-th of search term is corrected as the general of corresponding first accurate word in target search word segment
Rate, p (cj+1|cj) it is that+1 third of jth corrects probability, i.e. occur the in the first accurate word segment after j-th of first accurate words
The probability of j+1 the first accurate words.
Optionally, the corresponding candidate corrections probability of the first accurate word sequence is determined according to the following formula:
S.t.0=n1< n2< ... < nk=N
Wherein, p (c1,c2,...,cN|q1,q2,...,qN) it is the corresponding candidate corrections probability of the first accurate word sequence;c1,
c2,...,cNIt is the first accurate word sequence;q1,q2,...,qNIt is the corresponding search word sequence of target search text;N is search term
Sum;K is the corresponding search term number of fragments of the first segmented mode;niIt is that the last one in (i-1)-th search term segment is searched for
The subscript of word;It is i-th of search term segment;It is corresponding first accurate word of i-th of search term segment
Segment;It is that the corresponding segment of i-th of search term segment corrects probability.
Optionally, the accurate text determining module 350 of target, is also used to: according to candidate's accurate word each under each segmented mode
The corresponding candidate corrections probability of sequence determines the corresponding accurate word sequence to be selected of each segmented mode and each accurate word to be selected
The corresponding correction probability to be selected of sequence;The corresponding accurate word sequence to be selected of maximum correction probability to be selected is determined as target search
The accurate word sequence of the corresponding target of text.
The correcting device of search text provided by the embodiment of the present invention can be performed provided by any embodiment of the invention
The correcting method for searching for text has the corresponding functional module of correcting method and beneficial effect for executing search text.
It is worth noting that, in the embodiment of the correcting device of above-mentioned search text, included each unit and module
It is only divided according to the functional logic, but is not limited to the above division, as long as corresponding functions can be realized;
In addition, the specific name of each functional unit is also only for convenience of distinguishing each other, the protection scope being not intended to restrict the invention.
Example IV
Fig. 4 is a kind of structural schematic diagram for terminal that the embodiment of the present invention four provides.Referring to fig. 4, which includes:
One or more processors 410;
Memory 420, for storing one or more programs;
When one or more programs are executed by one or more processors 410, so that one or more processors 410 are realized
As in above-described embodiment any embodiment propose search text correcting method, this method comprises:
Target search text is obtained, and word segmentation processing is carried out to the target search text and determines the target search text
Corresponding search word sequence;
The corresponding each accurate word sequence of candidate of described search word sequence is determined according to search corpus, wherein described search
The candidate accurate word in search term and the candidate accurate word sequence in word sequence corresponds;
Described search word sequence is segmented, determines the corresponding each search term segment of each segmented mode and each described
The corresponding each candidate accurate word segment of search term segment, wherein the segmented mode includes search term number of fragments and each
The corresponding search term quantity of search term segment;
According to the corresponding each search term segment of each segmented mode, corresponding with described search word segment each candidate quasi-
True word segment, described search corpus and search term sum determine each candidate accurate word under each segmented mode
The corresponding candidate corrections probability of sequence;
According to the candidate accurate corresponding candidate corrections probability of word sequence each under each segmented mode, determine described in
The accurate word sequence of the corresponding target of target search text, and the target search text is determined according to the accurate word sequence of the target
The corresponding accurate text of target.
In Fig. 4 by taking a processor 410 as an example;Processor 410 and memory 420 in terminal can by bus or its
He connects mode, in Fig. 4 for being connected by bus.
Memory 420 is used as a kind of computer readable storage medium, can be used for storing software program, journey can be performed in computer
Sequence and module, if the corresponding program instruction/module of correcting method of the search text in the embodiment of the present invention is (for example, search
Search word sequence determining module 310, candidate accurate word sequence determining module 320, search word sequence in the correcting device of text
Segmentation module 330, candidate corrections probability determination module 340 and the accurate text determining module 350 of target).Processor 410 passes through fortune
Software program, instruction and the module that row is stored in memory 420, thereby executing the various function application and data of terminal
The correcting method of above-mentioned search text is realized in processing.
Memory 420 mainly includes storing program area and storage data area, wherein storing program area can store operation system
Application program needed for system, at least one function;Storage data area, which can be stored, uses created data etc. according to terminal.This
Outside, memory 420 may include high-speed random access memory, can also include nonvolatile memory, for example, at least one
Disk memory, flush memory device or other non-volatile solid state memory parts.In some instances, memory 420 can be into one
Step includes the memory remotely located relative to processor 410, these remote memories can pass through network connection to terminal.On
The example for stating network includes but is not limited to internet, intranet, local area network, mobile radio communication and combinations thereof.
The correcting method for the search text that the terminal and above-described embodiment that the present embodiment proposes propose belongs to same invention structure
Think, the technical detail of detailed description not can be found in above-described embodiment in the present embodiment, and the present embodiment has and executes search
The identical beneficial effect of the correcting method of text.
Embodiment five
The present embodiment five provides a kind of computer readable storage medium, is stored thereon with computer program, the program quilt
The correcting method of the search text as described in any embodiment of that present invention is realized when processor executes, this method comprises:
Target search text is obtained, and word segmentation processing is carried out to the target search text and determines the target search text
Corresponding search word sequence;
The corresponding each accurate word sequence of candidate of described search word sequence is determined according to search corpus, wherein described search
The candidate accurate word in search term and the candidate accurate word sequence in word sequence corresponds;
Described search word sequence is segmented, determines the corresponding each search term segment of each segmented mode and each described
The corresponding each candidate accurate word segment of search term segment, wherein the segmented mode includes search term number of fragments and each
The corresponding search term quantity of search term segment;
According to the corresponding each search term segment of each segmented mode, corresponding with described search word segment each candidate quasi-
True word segment, described search corpus and search term sum determine each candidate accurate word under each segmented mode
The corresponding candidate corrections probability of sequence;
According to the candidate accurate corresponding candidate corrections probability of word sequence each under each segmented mode, determine described in
The accurate word sequence of the corresponding target of target search text, and the target search text is determined according to the accurate word sequence of the target
The corresponding accurate text of target.
The computer storage medium of the embodiment of the present invention, can be using any of one or more computer-readable media
Combination.Computer-readable medium can be computer-readable signal media or computer readable storage medium.It is computer-readable
Storage medium can be for example but not limited to: electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor system, device or device, or
Any above combination of person.The more specific example (non exhaustive list) of computer readable storage medium includes: with one
Or the electrical connections of multiple conducting wires, portable computer diskette, hard disk, random access memory (RAM), read-only memory (ROM),
Erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light
Memory device, magnetic memory device or above-mentioned any appropriate combination.In this document, computer readable storage medium can
With to be any include or the tangible medium of storage program, the program can be commanded execution system, device or device use or
Person is in connection.
Computer-readable signal media may include in a base band or as carrier wave a part propagate data-signal,
Wherein carry computer-readable program code.The data-signal of this propagation can take various forms, including but unlimited
In electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be that computer can
Any computer-readable medium other than storage medium is read, which can send, propagates or transmit and be used for
By the use of instruction execution system, device or device or program in connection.
The program code for including on computer-readable medium can transmit with any suitable medium, including but not limited to:
Wirelessly, electric wire, optical cable, RF etc. or above-mentioned any appropriate combination.
The computer for executing operation of the present invention can be write with one or more programming languages or combinations thereof
Program code, described program design language include object oriented program language, such as Java, Smalltalk, C++, also
Including conventional procedural programming language-such as " C " language or similar programming language.Program code can be complete
It executes, partly executed on the user computer on the user computer entirely, being executed as an independent software package, part
Part executes on the remote computer or executes on a remote computer or server completely on the user computer.It is relating to
And in the situation of remote computer, remote computer can pass through the network of any kind, including local area network (LAN) or wide area network
(WAN), it is connected to subscriber computer, or, it may be connected to outer computer (such as led to using ISP
Cross internet connection).
Will be appreciated by those skilled in the art that each module of the above invention or each step can use general meter
Device is calculated to realize, they can be concentrated on single computing device, or be distributed in network constituted by multiple computing devices
On, optionally, they can be realized with the program code that computer installation can be performed, so as to be stored in storage
It is performed by computing device in device, perhaps they are fabricated to each integrated circuit modules or will be more in them
A module or step are fabricated to single integrated circuit module to realize.In this way, the present invention is not limited to any specific hardware and
The combination of software.
Note that the above is only a better embodiment of the present invention and the applied technical principle.It will be appreciated by those skilled in the art that
The invention is not limited to the specific embodiments described herein, be able to carry out for a person skilled in the art it is various it is apparent variation,
It readjusts and substitutes without departing from protection scope of the present invention.Therefore, although being carried out by above embodiments to the present invention
It is described in further detail, but the present invention is not limited to the above embodiments only, without departing from the inventive concept, also
It may include more other equivalent embodiments, and the scope of the invention is determined by the scope of the appended claims.