CN110019684A - A kind of correcting method, device, terminal and storage medium for searching for text - Google Patents

A kind of correcting method, device, terminal and storage medium for searching for text Download PDF

Info

Publication number
CN110019684A
CN110019684A CN201810941106.XA CN201810941106A CN110019684A CN 110019684 A CN110019684 A CN 110019684A CN 201810941106 A CN201810941106 A CN 201810941106A CN 110019684 A CN110019684 A CN 110019684A
Authority
CN
China
Prior art keywords
search
word
segment
accurate
probability
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810941106.XA
Other languages
Chinese (zh)
Other versions
CN110019684B (en
Inventor
王璐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Working at Bide Digital Technology (Guangzhou) Co.,Ltd.
Original Assignee
Wuhan Douyu Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Douyu Network Technology Co Ltd filed Critical Wuhan Douyu Network Technology Co Ltd
Priority to CN201810941106.XA priority Critical patent/CN110019684B/en
Publication of CN110019684A publication Critical patent/CN110019684A/en
Application granted granted Critical
Publication of CN110019684B publication Critical patent/CN110019684B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention discloses a kind of correcting method, device, terminal and storage mediums for searching for text, this method comprises: obtaining target search text, and carry out word segmentation processing to target search text and determine the corresponding search word sequence of target search text;The corresponding each accurate word sequence of candidate of search word sequence is determined according to search corpus;Search word sequence is segmented, determines the corresponding each search term segment of each segmented mode and the corresponding each candidate accurate word segment of each search term segment;According to search corpus and search term sum, the corresponding candidate corrections probability of each candidate accurate word sequence under each segmented mode is determined;According to the corresponding candidate corrections probability of each candidate accurate word sequence under each segmented mode, the accurate text of the corresponding target of target search text is determined.The technical solution of the embodiment of the present invention, can solve in the way of being segmented error correction in the prior art because of search term relevance it is poor caused by correct the lower problem of accuracy.

Description

A kind of correcting method, device, terminal and storage medium for searching for text
Technical field
The present embodiments relate to the information processing technology more particularly to a kind of correcting method, devices, terminal for searching for text And storage medium.
Background technique
With the fast development of Information technology, user can carry out information inquiry by way of search, to obtain required letter Breath.In general, user can find out from information aggregate and search text according to self-demand search text by gopher This corresponding search result.For example, user can input main broadcaster's title in search entrance in network direct broadcasting platform, thus The live video for wanting viewing can be quickly found out.
In general, mistake often occurs in search text for user, for example misspelling, word are reverse etc., so that User can not find required search result, it is therefore desirable to carry out correction processing to the search text of user's input.
Existing correction procedure is: after segmenting to search text, correction processing directly is carried out to each search term, and And need to consider the context of each word after correction.However, the participle for searching for text in the prior art is not often accurate, and When the relevance of search term is poor, if still considering the context of word after correcting, there may be ambiguity information, often can not Determine that real accurate text affects the search experience of user to reduce the accuracy of correction.
Summary of the invention
The embodiment of the invention provides a kind of correcting method, device, terminal and storage mediums for searching for text, existing to solve Have in technology because search term relevance it is poor caused by correct the lower problem of accuracy, to improve correction accuracy, in turn Promote the search experience of user.
In a first aspect, the embodiment of the invention provides a kind of correcting methods for searching for text, comprising:
Target search text is obtained, and word segmentation processing is carried out to the target search text and determines the target search text Corresponding search word sequence;
The corresponding each accurate word sequence of candidate of described search word sequence is determined according to search corpus, wherein described search The candidate accurate word in search term and the candidate accurate word sequence in word sequence corresponds;
Described search word sequence is segmented, determines the corresponding each search term segment of each segmented mode and each described The corresponding each candidate accurate word segment of search term segment, wherein the segmented mode includes search term number of fragments and each The corresponding search term quantity of search term segment;
According to the corresponding each search term segment of each segmented mode, corresponding with described search word segment each candidate quasi- True word segment, described search corpus and search term sum determine each candidate accurate word under each segmented mode The corresponding candidate corrections probability of sequence;
According to the candidate accurate corresponding candidate corrections probability of word sequence each under each segmented mode, determine described in The accurate word sequence of the corresponding target of target search text, and the target search text is determined according to the accurate word sequence of the target The corresponding accurate text of target.
Second aspect, the embodiment of the invention also provides a kind of correcting devices for searching for text, comprising:
Word sequence determining module is searched for, is segmented for obtaining target search text, and to the target search text Processing determines the corresponding search word sequence of the target search text;
Candidate accurate word sequence determining module, for determining the corresponding each time of described search word sequence according to search corpus Make an accurate selection of true word sequence, wherein the candidate accurate word in search term and the candidate accurate word sequence in described search word sequence It corresponds;
Search term sequence segment module determines that each segmented mode is corresponding for being segmented to described search word sequence Each search term segment and the corresponding each candidate accurate word segment of each described search word segment, wherein the segmented mode packet Include search term number of fragments and the corresponding search term quantity of each search term segment;
Candidate corrections probability determination module, for according to the corresponding each search term segment of each segmented mode and institute The corresponding each candidate accurate word segment of search term segment, described search corpus and search term sum are stated, is determined each described The corresponding candidate corrections probability of each candidate accurate word sequence under segmented mode;
The accurate text determining module of target, for according to the candidate accurate word sequence pair each under each segmented mode The candidate corrections probability answered determines the corresponding accurate word sequence of target of the target search text, and accurate according to the target Word sequence determines the corresponding accurate text of target of the target search text.
The third aspect, the embodiment of the invention also provides a kind of terminal, the terminal includes:
One or more processors;
Memory, for storing one or more programs;
When one or more of programs are executed by one or more of processors, so that one or more of processing Device realizes the correcting method of the search text as described in any embodiment of that present invention.
Fourth aspect, the embodiment of the invention also provides a kind of computer readable storage mediums, are stored thereon with computer Program realizes the correcting method of the search text as described in any embodiment of that present invention when the program is executed by processor.
The embodiment of the present invention determines the corresponding each search term piece of each segmented mode by being segmented to search word sequence Section and the corresponding each candidate accurate word segment of each search term segment;And according to the corresponding each search term piece of each segmented mode Section, each candidate accurate word segment corresponding with search term segment, search corpus and search term sum, determine each segmentation The corresponding candidate corrections probability of each candidate accurate word sequence under mode;According to candidate accurate word sequence pair each under each segmented mode The candidate corrections probability answered determines the accurate word sequence of the corresponding target of target search text, and true according to the accurate word sequence of target It sets the goal and searches for the accurate text of the corresponding target of text.By carrying out segmentation error correction to search word sequence, each segmentation side is considered The relevance of search term under formula may thereby determine that the corresponding optimal accurate word sequence of candidate of optimal segmented mode, avoid Lead to the situation of the accurate word sequence inaccuracy of target calculated because the relevance of search term is lower, improves the accurate of correction Property, and then improve the search experience of user.
Detailed description of the invention
Fig. 1 is a kind of flow chart of the correcting method for search text that the embodiment of the present invention one provides;
Fig. 2 is a kind of flow chart of correcting method for searching for text provided by Embodiment 2 of the present invention;
Fig. 3 is a kind of structural schematic diagram of the correcting device for search text that the embodiment of the present invention three provides;
Fig. 4 is a kind of structural schematic diagram for terminal that the embodiment of the present invention four provides.
Specific embodiment
The present invention is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched The specific embodiment stated is used only for explaining the present invention rather than limiting the invention.It also should be noted that in order to just Only the parts related to the present invention are shown in description, attached drawing rather than entire infrastructure.
Embodiment one
Fig. 1 is a kind of flow chart of the correcting method for search text that the embodiment of the present invention one provides, and the present embodiment can fit The case where for carrying out spelling error correction to search text, it is particularly possible to for correcting the field of search text in network direct broadcasting platform Scape.This method can be executed by the correcting device of search text, which can be by the mode of software and/or hardware Lai real It is existing, it is integrated in the terminal with function of search, such as smart phone, tablet computer, desktop computer etc..This method is specifically wrapped Include following steps:
S110, target search text is obtained, and word segmentation processing is carried out to target search text and determines target search text pair The search word sequence answered.
Wherein, target search text refers to user's search text currently entered.Illustratively, current search can be entered Search text in mouthful is determined as target search text.Word segmentation processing can refer to according to dictionary for word segmentation or other participle rule Then, target search text is divided into multiple search terms.Search word sequence refers to after target search text progress word segmentation processing The sequence of obtained each search term composition.Search term in search word sequence puts in order and the search in target search text Word order is consistent.Illustratively, if target search text is " not soul-stirring ", after carrying out word match according to dictionary for word segmentation, Determining search word sequence can be with are as follows: ", not soul-stirring ".
S120, the corresponding each accurate word sequence of candidate of search word sequence is determined according to search corpus, wherein search word order The candidate accurate word in search term and candidate accurate word sequence in column corresponds.
Wherein, search corpus can be predefined according to the search behavior log of a large number of users.Searching for corpus can be with Accurate keyword after being repaired including a large amount of historical search keyword and each historical search keyword, wherein accurate close Keyword can be determined according to the clicking operation of user.Candidate accurate word sequence refers to that target search text is corresponding any Possible correct sequence.The present embodiment can determine optimal accurate word sequence from all candidate accurate word sequences.In this reality It applies in example, each candidate corresponding candidate accurate word text of accurate word sequence, wherein the candidate in candidate accurate word text Accurate word order is consistent with candidate accurate word sequence.The candidate searched in the search term in word sequence and candidate accurate word sequence is quasi- True word corresponds, i.e., a candidate accurate word in search word sequence in the corresponding candidate accurate word sequence of each search term.Show Example property, the corresponding search word sequence of target search word text is q1,q2,...,qN, some candidate accurate word sequence is c1, c2,...,cN, wherein search term qiWith candidate accurate word ciIt corresponds.
Specifically, the present embodiment can determine that each search term in search word sequence is corresponding extremely according to search corpus A few candidate accurate word, and candidate accurate word is subjected to permutation and combination, determine that search word sequence is corresponding each candidate accurate Word sequence.Illustratively, if search word sequence is ", not soul-stirring ", determine that search term " not " is corresponding according to search corpus Candidate accurate word are as follows: " not ", " step by step " and " portion, portion ", the corresponding candidate accurate word of search term " soul-stirring " are as follows: " soul-stirring " and " meticulous ", it is determined that each accurate word sequence of candidate are as follows: " not, meticulously ", " step by step, soul-stirring ", " step by step, meticulously ", " portion, portion, It is soul-stirring " and " portion, portion, meticulously ".It should be noted that candidate accurate word may be search term itself, this is because target search Some search term in text is also likely to be the accurate search term of spelling.
S130, search word sequence is segmented, determines the corresponding each search term segment of each segmented mode and respectively searches The corresponding each candidate accurate word segment of rope word segment.
Wherein, search term segment includes at least one search term.Search term segment can be one in search word sequence Point, it is also possible to entirely search for word sequence, concrete condition is determined by segmented mode.Segmented mode may include search term segments Amount and the corresponding search term quantity of each search term segment.The value range of search term number of fragments in segmented mode are as follows: More than or equal to 1 and it is less than or equal to the corresponding search term sum of search word sequence.Segmented mode in the present embodiment is searched for characterizing It includes several search terms that rope word sequence, which is divided into several search term segments and each search term segment,.The number of segmented mode can To be determined according to the corresponding search term sum of search word sequence.It is candidate accurate for each candidate accurate word sequence Each of word sequence candidate's accurate word segment and search term segment correspond, and the quantity of candidate accurate word segment with search The quantity of rope word segment is identical and the candidate accurate word quantity of each candidate accurate word segment and corresponding search term segment Search term quantity is also identical.
Illustratively, it is assumed that search word sequence is q1,q2,q3, some candidate accurate word sequence is c1,c2,c3, then have four Kind segmented mode, the first segmented mode are as follows: search word sequence is only divided into a search term segment, i.e. q1,q2,q3, right at this time The candidate accurate word segment answered is c1,c2,c3;Second of segmented mode are as follows: search word sequence is divided into two search term segments, First search term segment includes two search terms, and second search term segment includes a search term, i.e. q1,q2And q3, at this time Corresponding candidate's accurate word segment is respectively c1,c2And c3;The third segmented mode are as follows: search word sequence is divided into two search Word segment, first search term segment include a search term, and second search term segment includes two search terms, i.e. q1And q2, q3, candidate accurate word segment corresponding at this time is respectively c1And c2,c3;4th kind of segmented mode are as follows: search word sequence is divided into three A search term segment, each search term is as a search term segment, i.e. q1、q2And q3, candidate accurate word segment corresponding at this time Respectively c1、c2And c3
Specifically, each segmented mode corresponds at least one search term segment, each search term segment corresponds to each A candidate accurate word segment in candidate accurate word sequence.
S140, according to the corresponding each search term segment of each segmented mode, corresponding with search term segment each candidate accurate Word segment, search corpus and search term sum, determine the corresponding candidate of each candidate accurate word sequence under each segmented mode Correct probability.
Wherein, search term sum refers to the quantity of the search term of composition search word sequence.Candidate accurate word sequence is corresponding Candidate corrections probability refers to that search word sequence is corrected as the correction probability of the accurate word sequence of the candidate.
Specifically, since each segmented mode of search word sequence is different, to be calculated under each segmented mode The accurate corresponding candidate corrections probability of word sequence of candidate it is also different.For some candidate accurate word sequence under some segmented mode For, according to search corpus, the corresponding each search term segment of the segmented mode, candidate accurately in word sequence with each search term The corresponding each candidate accurate word segment of segment and search term sum determine that the corresponding candidate corrections of the accurate word sequence of the candidate are general Rate.The present embodiment, which passes through, calculates the corresponding candidate corrections probability of each candidate accurate word sequence under each segmented mode, so as to To obtain considering the context of at least one candidate accurate word or not considering all of whole contexts of candidate accurate words In the case of corresponding candidate corrections probability, with solve because the relevance of search term is lower or search corpus in historical data It is less and lead to correct the low problem of accuracy, improve the accuracy of correction.
Illustratively, if segmented mode are as follows: using all search terms as a search term segment, i.e. search word sequence only wraps Include a search term segment, that is, corresponding the case where not being segmented, then each accurate word order of candidate that this kind of segmented mode determines Corresponding candidate corrections probability is arranged to be required to consider the context relation of each candidate accurate word.If segmented mode are as follows: each search Rope word is used as a search term segment, the i.e. situation of search term number of fragments maximum, then this kind of segmented mode determines each The candidate accurate corresponding candidate corrections probability of word sequence does not take into account that the context relation of candidate accurate word.
S150, according to the candidate accurate corresponding candidate corrections probability of word sequence each under each segmented mode, determine that target is searched The accurate word sequence of the corresponding target of Suo Wenben, and determine that the corresponding target of target search text is accurate according to the accurate word sequence of target Text.
Wherein, the accurate word sequence of target, which can be, refers to the corresponding optimal accurate word order of candidate under optimal segmented mode Column.For each segmented mode, the corresponding candidate corrections probability of each candidate accurate word sequence can be obtained.In other words, For each candidate accurate word sequence, it is also possible to obtain accurate candidate of the word sequence under each segmented mode of the candidate entangles Positive probability.In the present embodiment, it can include but is not limited to following two for determining that the corresponding target of target search text is quasi- The mode of true word sequence.
Optionally, according to the corresponding candidate corrections probability of each candidate accurate word sequence under each segmented mode, target is determined Search for the corresponding accurate word sequence of target of text, comprising: according to the corresponding time of each candidate accurate word sequence under each segmented mode Choosing correct probability, determine the corresponding accurate word sequence to be selected of each segmented mode and each accurate word sequence to be selected it is corresponding to Probability is corrected in choosing;The corresponding accurate word sequence to be selected of maximum correction probability to be selected is determined as the corresponding mesh of target search text The true word sequence of standard.Specifically, for some segmented mode, candidate accurate word sequence each under the segmented mode is corresponding Candidate corrections probability be compared, the corresponding candidate accurate word sequence of maximum candidate corrections probability is determined into the segmented mode Corresponding accurate word sequence to be selected, and be that the accurate word sequence to be selected is corresponding by maximum candidate corrections determine the probability and to be selected entangle Positive probability, so as to obtain the corresponding accurate word sequence to be selected of each segmented mode, then by each accurate word sequence to be selected Corresponding correction probability to be selected is compared, and the corresponding accurate word sequence to be selected of maximum correction probability to be selected is determined as target The corresponding accurate word sequence of target of text is searched for, may thereby determine that more accurate accurate word sequence.
Optionally, according to the corresponding candidate corrections probability of each candidate accurate word sequence under each segmented mode, target is determined Search for the corresponding accurate word sequence of target of text, comprising: according to the corresponding time of each candidate accurate word sequence under each segmented mode Probability is corrected in choosing, determines that the corresponding target of each candidate accurately word sequence corrects probability;It is corresponding that maximum target is corrected into probability The accurate word sequence of candidate be determined as the accurate word sequence of the corresponding target of target search text.Specifically, candidate quasi- for some For true word sequence, the candidate corrections probability that the accurate word sequence of the candidate determines under each segmented mode is compared, it will Maximum candidate corrections determine the probability is that the corresponding target of the accurate word sequence of the candidate corrects probability, so as to obtain each time It makes an accurate selection of the corresponding target of true word sequence and corrects probability, the corresponding target of each candidate accurately word sequence is then corrected into probability and is carried out Compare, it is accurate that the corresponding candidate accurate word sequence of maximum target correction probability is determined as the corresponding target of target search text Word sequence may thereby determine that more accurate accurate word sequence.
The present embodiment can press each accurate word in the accurate word sequence of target after determining the accurate word sequence of target Direct splicing is carried out according to the sequence of the accurate word sequence of target, and guarantees the accurate word order in the accurate word sequence of target and determination Accurate word sequence consensus in the accurate text of target, so as to obtain the accurate text of the corresponding target of target search text, i.e., Optimal accurate text.Target search text can be corrected as the accurate text of target by the present embodiment, automatically according to target standard Accurate search result needed for true text scans for obtain user improves searching accuracy, and without user's weight It is newly manually entered accurately search text, to improve the search experience of user.
The embodiment of the present invention determines the corresponding each search term piece of each segmented mode by being segmented to search word sequence Section and the corresponding each candidate accurate word segment of each search term segment;And according to the corresponding each search term piece of each segmented mode Section, each candidate accurate word segment corresponding with search term segment, search corpus and search term sum, determine each segmentation The corresponding candidate corrections probability of each candidate accurate word sequence under mode;According to candidate accurate word sequence pair each under each segmented mode The candidate corrections probability answered determines the accurate word sequence of the corresponding target of target search text, and true according to the accurate word sequence of target It sets the goal and searches for the accurate text of the corresponding target of text.By carrying out segmentation error correction to search word sequence, each segmentation side is considered The relevance of search term under formula may thereby determine that the corresponding optimal accurate word sequence of candidate of optimal segmented mode, avoid Lead to the situation of the accurate word sequence inaccuracy of target calculated because the relevance of search term is lower, improves the accurate of correction Property, and then improve the search experience of user.
Embodiment two
Fig. 2 is a kind of flow chart of correcting method for searching for text provided by Embodiment 2 of the present invention, and the present embodiment is upper On the basis of stating embodiment one, to " according to the corresponding each search term segment of each segmented mode, corresponding with search term segment each Candidate accurate word segment, search corpus, Wilson's confidence interval formula and search term sum, determine under each segmented mode The corresponding candidate corrections probability of each candidate accurate word sequence " is advanced optimized.It is wherein identical as above-described embodiment one or Details are not described herein for the explanation of corresponding term.
Referring to fig. 2, it is provided in this embodiment search text correcting method specifically includes the following steps:
S210, target search text is obtained, and word segmentation processing is carried out to target search text and determines target search text pair The search word sequence answered.
S220, the corresponding each accurate word sequence of candidate of search word sequence is determined according to search corpus, wherein search word order The candidate accurate word in search term and candidate accurate word sequence in column corresponds.
S230, search word sequence is segmented, determines the corresponding each search term segment of each segmented mode and respectively searches The corresponding each candidate accurate word segment of rope word segment.
S240, each segmented mode is determined as to the first segmented mode one by one, and will be each candidate quasi- under the first segmented mode True word sequence is determined as the first accurate word sequence one by one.
Specifically, the present embodiment is by being determined as the first segmented mode for each segmented mode one by one, and by the first segmentation Candidate accurate word sequence is determined as the first accurate word sequence one by one each of under mode, each candidate quasi- in candidate accurate word sequence True word segment is determined as each candidate in each first accurate word segment in the first accurate word sequence, and candidate accurate word segment Accurate word is determined as each first accurate word in the first accurate word segment, to determine one by one each according to identical determination process The corresponding candidate corrections probability of each accurate word sequence under segmented mode.
S250, according to search corpus, determine that the corresponding each segment of the first segmented mode corrects probability.
Wherein, the first segmented mode corresponds at least one search term segment.It is that search term segment is corrected that segment, which corrects probability, For the probability of corresponding first accurate word segment.The present embodiment can be according to the search in search corpus and search term segment Word quantity determines that each search term segment is corrected as the probability of corresponding first accurate word segment, i.e., each search term segment pair The segment answered corrects probability.
Optionally, S250 includes: that the corresponding each search term segment of the first segmented mode is determined as target search word one by one Segment;If target search word segment only includes a search term, determine that target search word segment is corresponding according to search corpus First correct probability, and correcting determine the probability for first is that the corresponding segment of target search word segment corrects probability;If target Search term segment includes at least two search terms, then determines that target search word segment corresponding each second is entangled according to search corpus Positive probability and each third correct probability, and correct probability and each third correction probability according to each second, determine target search word The corresponding segment of segment corrects probability.
Wherein, the second correction probability is that the current search word marked in search term segment is corrected as corresponding current first accurately The probability of word.It is the probability of next first accurate word occur after current first accurate word that third, which corrects probability,.When target search word When segment includes at least two search term, each search term in target search word segment is determined as current search word one by one. Current first accurate word refers to corresponding first accurate word of current search word in the first accurate word segment.Latter first accurate word Refer to the sequence according to the first accurate word sequence, after current first accurate word and the first adjacent accurate word.
Specifically, the corresponding each search term segment of the first segmented mode is determined as target search word segment one by one, from And it can determine the corresponding segment of each search term segment one by one by identical process and correct probability.If target search word piece Section only includes a search term, i.e., corresponding first accurate word segment also only includes first accurate word, then corrects first Probability is determined directly as the corresponding segment of target search word segment and corrects probability.If target search word segment is searched including at least two Rope word, i.e., corresponding first accurate word segment also includes at least two first accurate words, then according to each second correction probability and respectively Third corrects the corresponding segment of the determine the probability target search word segment and corrects probability.It should be noted that working as target search word When segment only includes a search term, without considering the context relation of corresponding first accurate word of the search term, and work as target When search term segment includes at least two search term, need to consider the context of each first accurate word in the first accurate word segment Relationship, i.e. third correct probability.
Illustratively, if target search word segment only includes a search term q1, corresponding first accurate word segment is also only Including a first accurate word c1, then search term q is determined according to search corpus1It is corrected as corresponding first accurate word c1's First corrects probability as p (c1|q1), it at this time can be by p (c1|q1) it is determined directly as the corresponding segment of target search word segment Correct probability.If target search word segment includes two search term q1And q2, corresponding first accurate word segment also includes two One accurate word c1And c2, then determine that each search term is corrected as the probability of corresponding first accurate word according to search corpus, i.e., two A second corrects probability are as follows: p (c1|q1) and p (c2|q2) and next first accurate word occur after current first accurate word Probability, i.e. a third correct probability are as follows: p (c2|c1), at this time according to p (c1|q1)、p(c2|q2) and p (c2|c1) determine the target The corresponding segment of search term segment corrects probability.
Optionally, determine that target search word segment corresponding each second corrects probability and each third according to search corpus Correct probability, comprising: according to search corpus determine the corresponding historical search number of current search word in target search word segment, Current search word is corrected as corresponding current first accurate word of the historical correction number of corresponding first accurate word, current search word The first frequency of occurrence and current first accurate word next first accurate word the second frequency of occurrence;According to historical search Number and historical correction number determine that each second corrects probability;According to the first frequency of occurrence and the second frequency of occurrence, each is determined Three correct probability.
Wherein, the corresponding historical search number of current search word refers to current search word in the historical search of search corpus The number occurred in data.The historical correction number that current search word is corrected as corresponding first accurate word refers in search corpus In the historical search data in library, current search word inaccuracy is corrected as the number of the first accurate word.Current search word is corresponding First frequency of occurrence of current first accurate word refers to the number that current first accurate word occurs in search corpus.Current Second frequency of occurrence of next first accurate word of one accurate word refers to that next first accurate word of current first accurate word is being searched The number occurred in rope corpus.
Specifically, the present embodiment can be by historical correction when target search word segment includes at least two search term Number is determined as current search word corresponding second with the ratio of historical search number and corrects probability, by the second frequency of occurrence and first The ratio of frequency of occurrence is determined as third and corrects probability.
The present embodiment, can also be by the search term in search corpus when target search word segment only includes a search term It is general to be determined as the first correction divided by the historical search number of the search term by the historical correction number in library for obtained operation result Rate, the i.e. search term are corrected as the probability of corresponding first accurate word.
Optionally, in the present embodiment, it can be assumed that search term segment is corrected as the piece of corresponding first accurate word segment Section corrects probability and meets hidden Markov models, easily determines that segment corrects probability so as to more.Illustratively, segment Correct probabilityCalculation formula can simplify it is as follows:
That is:
Wherein,It is target search word segment;It is that target search word segment is corresponding first accurate Word segment;ni+ 1 is that the first accurate word is corresponding in first search term or the first accurate word segment in target search word segment Subscript;ni+1It is that the last one in the last one search term or the first accurate word segment is first accurate in target search word segment The corresponding subscript of word;It is that the corresponding segment of target search word segment corrects probability;p(cj|qj) It is j-th second correction probability, i.e., j-th of search term is corrected as the general of corresponding first accurate word in target search word segment Rate, p (cj+1|cj) it is that+1 third of jth corrects probability, i.e. occur the in the first accurate word segment after j-th of first accurate words The probability of j+1 the first accurate words.
Specifically, if target search word segment include at least two search terms, can will it is each second correct probability with Each third corrects probability and is multiplied, and multiplied result is determined as the corresponding segment of the target search word segment and is corrected generally Rate.
S260, the first segmented mode is determined according to each segment correction probability, search term number of fragments and search term sum The corresponding candidate corrections probability of lower first accurate word sequence.
Optionally, the corresponding candidate corrections probability of the first accurate word sequence is determined according to the following formula:
S.t.0=n1< n2< ... < nk=N
Wherein, p (c1,c2,...,cN|q1,q2,...,qN) it is the corresponding candidate corrections probability of the first accurate word sequence;c1, c2,...,cNIt is the first accurate word sequence;q1,q2,...,qNIt is the corresponding search word sequence of target search text;N is search term Sum;K is the corresponding search term number of fragments of the first segmented mode;niIt is that the last one in (i-1)-th search term segment is searched for The subscript of word;It is i-th of search term segment;It is corresponding first accurate word of i-th of search term segment Segment;It is that the corresponding segment of i-th of search term segment corrects probability.
Illustratively, it is assumed that search term sum is 3, i.e. N is equal to 3, and the first accurate word sequence is c1,c2,c3, target search The corresponding search word sequence of text is q1,q2,q3, the first segmented mode are as follows: and search word sequence is divided into two search term segments, First search term segment includes two search terms, and second search term segment includes a search term, then the first segmented mode Corresponding two search term segments, respectively q1,q2And q3, corresponding first accurate word segment is respectively c1,c2And c3, the first segmentation The corresponding search term number of fragments k of mode is 2, the subscript n of the last one search term in first search term segment2=2, second The subscript n of the last one search term in a search term segment3=3, the first accurate word sequence c at this time1,c2,c3Corresponding candidate entangles Positive probability are as follows:
In the present embodiment, it by repeating step S240-S260, each of can determine under each segmented mode candidate quasi- The corresponding candidate corrections probability of true word sequence.
S270, according to the candidate accurate corresponding candidate corrections probability of word sequence each under each segmented mode, determine that target is searched The accurate word sequence of the corresponding target of Suo Wenben, and determine that the corresponding target of target search text is accurate according to the accurate word sequence of target Text.
The technical solution of the present embodiment, by being corrected according to the corresponding each segment of search term segment each under each segmented mode Probability can accurately determine the corresponding candidate corrections probability of each candidate accurate word sequence under each segmented mode, so as to more Add and accurately determine the accurate word sequence after correcting, improves the accuracy of correction.
It is a kind of embodiment of correcting device for searching for text provided in an embodiment of the present invention below, the device and above-mentioned each A kind of correcting method of search text of embodiment belongs to the same inventive concept, in a kind of reality of correcting device for searching for text The detail content of not detailed description in example is applied, it can be with reference to a kind of above-mentioned embodiment for the correcting method for searching for text.
Embodiment three
Fig. 3 is a kind of structural schematic diagram of the correcting device for search text that the embodiment of the present invention three provides, the present embodiment Be applicable to the case where carrying out spelling error correction to search text, which specifically includes: search word sequence determining module 310 is waited It is quasi- to make an accurate selection of true word sequence determining module 320, search term sequence segment module 330, candidate corrections probability determination module 340 and target True text determining module 350.
Wherein, word sequence determining module 310 is searched for, is carried out for obtaining target search text, and to target search text Word segmentation processing determines the corresponding search word sequence of target search text;Candidate accurate word sequence determining module 320, is searched for basis Rope corpus determines the corresponding each accurate word sequence of candidate of search word sequence, wherein search term and candidate in search word sequence Candidate accurate word in accurate word sequence corresponds;Search term sequence segment module 330, for dividing search word sequence Section determines the corresponding each search term segment of each segmented mode and the corresponding each candidate accurate word segment of each search term segment, Wherein, segmented mode includes search term number of fragments and the corresponding search term quantity of each search term segment;Candidate corrections are general Rate determining module 340, for according to the corresponding each search term segment of each segmented mode, each candidate corresponding with search term segment Accurate word segment, search corpus and search term sum determine that each candidate accurate word sequence is corresponding under each segmented mode Candidate corrections probability;The accurate text determining module 350 of target, for according to candidate accurate word sequence pair each under each segmented mode The candidate corrections probability answered determines the accurate word sequence of the corresponding target of target search text, and true according to the accurate word sequence of target It sets the goal and searches for the accurate text of the corresponding target of text.
The embodiment of the present invention considers the pass of search term under each segmented mode by carrying out segmentation error correction to search word sequence Connection property may thereby determine that the corresponding optimal accurate word sequence of candidate of optimal segmented mode, avoid the pass because of search term Connection property is lower and leads to the situation of the accurate word sequence inaccuracy of target calculated, improves the accuracy of correction, and then improve The search experience of user.
Optionally, candidate corrections probability determination module 340, comprising:
First accurate word sequence determination unit, for each segmented mode to be determined as the first segmented mode one by one, and will Each candidate accurate word sequence is determined as the first accurate word sequence one by one under first segmented mode;
Segment corrects probability determining unit, for determining the corresponding each segment of the first segmented mode according to search corpus Correct probability, wherein it is the probability that search term segment is corrected as corresponding first accurate word segment that segment, which corrects probability,;
Candidate corrections probability determining unit, for correcting probability, search term number of fragments and search term according to each segment Sum determines the corresponding candidate corrections probability of the first accurate word sequence under the first segmented mode.
Optionally, segment corrects probability determining unit, comprising:
Target search word segment determines subelement, for determining the corresponding each search term segment of the first segmented mode one by one For target search word segment;
First segment corrects determine the probability subelement, if only including a search term, root for target search word segment Determine that the search term corresponding first corrects probability in target search word segment according to search corpus, and the first correction probability is true It is set to the corresponding segment of target search word segment and corrects probability, wherein the first correction probability is that the search term is corrected as corresponding the The probability of one accurate word;
Second segment corrects determine the probability subelement, if including at least two search terms for target search word segment, Determine that target search word segment corresponding each second corrects probability and each third and corrects probability according to search corpus, and according to Each second corrects probability and each third correction probability, determines that the corresponding segment of target search word segment corrects probability, wherein second Correcting probability is the probability that the current search word in target search word segment is corrected as corresponding current first accurate word, and third is entangled Positive probability is that the probability of next first accurate word occur after current first accurate word.
Optionally, the second segment corrects determine the probability subelement, is also used to: determining target search word according to search corpus The corresponding historical search number of current search word, current search word are corrected as the historical correction of corresponding first accurate word in segment Next the of first frequency of occurrence of corresponding current first accurate word of number, current search word and current first accurate word Second frequency of occurrence of one accurate word;According to historical search number and historical correction number, determine that each second corrects probability;According to First frequency of occurrence and the second frequency of occurrence determine that each third corrects probability.
Optionally, determine that the corresponding segment of target search word segment corrects probability according to the following formula:
Wherein,It is target search word segment;It is that target search word segment is corresponding first accurate Word segment;ni+ 1 is that the first accurate word is corresponding in first search term or the first accurate word segment in target search word segment Subscript;ni+1It is that the last one in the last one search term or the first accurate word segment is first accurate in target search word segment The corresponding subscript of word;It is that the corresponding segment of target search word segment corrects probability;p(cj|qj) It is j-th second correction probability, i.e., j-th of search term is corrected as the general of corresponding first accurate word in target search word segment Rate, p (cj+1|cj) it is that+1 third of jth corrects probability, i.e. occur the in the first accurate word segment after j-th of first accurate words The probability of j+1 the first accurate words.
Optionally, the corresponding candidate corrections probability of the first accurate word sequence is determined according to the following formula:
S.t.0=n1< n2< ... < nk=N
Wherein, p (c1,c2,...,cN|q1,q2,...,qN) it is the corresponding candidate corrections probability of the first accurate word sequence;c1, c2,...,cNIt is the first accurate word sequence;q1,q2,...,qNIt is the corresponding search word sequence of target search text;N is search term Sum;K is the corresponding search term number of fragments of the first segmented mode;niIt is that the last one in (i-1)-th search term segment is searched for The subscript of word;It is i-th of search term segment;It is corresponding first accurate word of i-th of search term segment Segment;It is that the corresponding segment of i-th of search term segment corrects probability.
Optionally, the accurate text determining module 350 of target, is also used to: according to candidate's accurate word each under each segmented mode The corresponding candidate corrections probability of sequence determines the corresponding accurate word sequence to be selected of each segmented mode and each accurate word to be selected The corresponding correction probability to be selected of sequence;The corresponding accurate word sequence to be selected of maximum correction probability to be selected is determined as target search The accurate word sequence of the corresponding target of text.
The correcting device of search text provided by the embodiment of the present invention can be performed provided by any embodiment of the invention The correcting method for searching for text has the corresponding functional module of correcting method and beneficial effect for executing search text.
It is worth noting that, in the embodiment of the correcting device of above-mentioned search text, included each unit and module It is only divided according to the functional logic, but is not limited to the above division, as long as corresponding functions can be realized; In addition, the specific name of each functional unit is also only for convenience of distinguishing each other, the protection scope being not intended to restrict the invention.
Example IV
Fig. 4 is a kind of structural schematic diagram for terminal that the embodiment of the present invention four provides.Referring to fig. 4, which includes:
One or more processors 410;
Memory 420, for storing one or more programs;
When one or more programs are executed by one or more processors 410, so that one or more processors 410 are realized As in above-described embodiment any embodiment propose search text correcting method, this method comprises:
Target search text is obtained, and word segmentation processing is carried out to the target search text and determines the target search text Corresponding search word sequence;
The corresponding each accurate word sequence of candidate of described search word sequence is determined according to search corpus, wherein described search The candidate accurate word in search term and the candidate accurate word sequence in word sequence corresponds;
Described search word sequence is segmented, determines the corresponding each search term segment of each segmented mode and each described The corresponding each candidate accurate word segment of search term segment, wherein the segmented mode includes search term number of fragments and each The corresponding search term quantity of search term segment;
According to the corresponding each search term segment of each segmented mode, corresponding with described search word segment each candidate quasi- True word segment, described search corpus and search term sum determine each candidate accurate word under each segmented mode The corresponding candidate corrections probability of sequence;
According to the candidate accurate corresponding candidate corrections probability of word sequence each under each segmented mode, determine described in The accurate word sequence of the corresponding target of target search text, and the target search text is determined according to the accurate word sequence of the target The corresponding accurate text of target.
In Fig. 4 by taking a processor 410 as an example;Processor 410 and memory 420 in terminal can by bus or its He connects mode, in Fig. 4 for being connected by bus.
Memory 420 is used as a kind of computer readable storage medium, can be used for storing software program, journey can be performed in computer Sequence and module, if the corresponding program instruction/module of correcting method of the search text in the embodiment of the present invention is (for example, search Search word sequence determining module 310, candidate accurate word sequence determining module 320, search word sequence in the correcting device of text Segmentation module 330, candidate corrections probability determination module 340 and the accurate text determining module 350 of target).Processor 410 passes through fortune Software program, instruction and the module that row is stored in memory 420, thereby executing the various function application and data of terminal The correcting method of above-mentioned search text is realized in processing.
Memory 420 mainly includes storing program area and storage data area, wherein storing program area can store operation system Application program needed for system, at least one function;Storage data area, which can be stored, uses created data etc. according to terminal.This Outside, memory 420 may include high-speed random access memory, can also include nonvolatile memory, for example, at least one Disk memory, flush memory device or other non-volatile solid state memory parts.In some instances, memory 420 can be into one Step includes the memory remotely located relative to processor 410, these remote memories can pass through network connection to terminal.On The example for stating network includes but is not limited to internet, intranet, local area network, mobile radio communication and combinations thereof.
The correcting method for the search text that the terminal and above-described embodiment that the present embodiment proposes propose belongs to same invention structure Think, the technical detail of detailed description not can be found in above-described embodiment in the present embodiment, and the present embodiment has and executes search The identical beneficial effect of the correcting method of text.
Embodiment five
The present embodiment five provides a kind of computer readable storage medium, is stored thereon with computer program, the program quilt The correcting method of the search text as described in any embodiment of that present invention is realized when processor executes, this method comprises:
Target search text is obtained, and word segmentation processing is carried out to the target search text and determines the target search text Corresponding search word sequence;
The corresponding each accurate word sequence of candidate of described search word sequence is determined according to search corpus, wherein described search The candidate accurate word in search term and the candidate accurate word sequence in word sequence corresponds;
Described search word sequence is segmented, determines the corresponding each search term segment of each segmented mode and each described The corresponding each candidate accurate word segment of search term segment, wherein the segmented mode includes search term number of fragments and each The corresponding search term quantity of search term segment;
According to the corresponding each search term segment of each segmented mode, corresponding with described search word segment each candidate quasi- True word segment, described search corpus and search term sum determine each candidate accurate word under each segmented mode The corresponding candidate corrections probability of sequence;
According to the candidate accurate corresponding candidate corrections probability of word sequence each under each segmented mode, determine described in The accurate word sequence of the corresponding target of target search text, and the target search text is determined according to the accurate word sequence of the target The corresponding accurate text of target.
The computer storage medium of the embodiment of the present invention, can be using any of one or more computer-readable media Combination.Computer-readable medium can be computer-readable signal media or computer readable storage medium.It is computer-readable Storage medium can be for example but not limited to: electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor system, device or device, or Any above combination of person.The more specific example (non exhaustive list) of computer readable storage medium includes: with one Or the electrical connections of multiple conducting wires, portable computer diskette, hard disk, random access memory (RAM), read-only memory (ROM), Erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light Memory device, magnetic memory device or above-mentioned any appropriate combination.In this document, computer readable storage medium can With to be any include or the tangible medium of storage program, the program can be commanded execution system, device or device use or Person is in connection.
Computer-readable signal media may include in a base band or as carrier wave a part propagate data-signal, Wherein carry computer-readable program code.The data-signal of this propagation can take various forms, including but unlimited In electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be that computer can Any computer-readable medium other than storage medium is read, which can send, propagates or transmit and be used for By the use of instruction execution system, device or device or program in connection.
The program code for including on computer-readable medium can transmit with any suitable medium, including but not limited to: Wirelessly, electric wire, optical cable, RF etc. or above-mentioned any appropriate combination.
The computer for executing operation of the present invention can be write with one or more programming languages or combinations thereof Program code, described program design language include object oriented program language, such as Java, Smalltalk, C++, also Including conventional procedural programming language-such as " C " language or similar programming language.Program code can be complete It executes, partly executed on the user computer on the user computer entirely, being executed as an independent software package, part Part executes on the remote computer or executes on a remote computer or server completely on the user computer.It is relating to And in the situation of remote computer, remote computer can pass through the network of any kind, including local area network (LAN) or wide area network (WAN), it is connected to subscriber computer, or, it may be connected to outer computer (such as led to using ISP Cross internet connection).
Will be appreciated by those skilled in the art that each module of the above invention or each step can use general meter Device is calculated to realize, they can be concentrated on single computing device, or be distributed in network constituted by multiple computing devices On, optionally, they can be realized with the program code that computer installation can be performed, so as to be stored in storage It is performed by computing device in device, perhaps they are fabricated to each integrated circuit modules or will be more in them A module or step are fabricated to single integrated circuit module to realize.In this way, the present invention is not limited to any specific hardware and The combination of software.
Note that the above is only a better embodiment of the present invention and the applied technical principle.It will be appreciated by those skilled in the art that The invention is not limited to the specific embodiments described herein, be able to carry out for a person skilled in the art it is various it is apparent variation, It readjusts and substitutes without departing from protection scope of the present invention.Therefore, although being carried out by above embodiments to the present invention It is described in further detail, but the present invention is not limited to the above embodiments only, without departing from the inventive concept, also It may include more other equivalent embodiments, and the scope of the invention is determined by the scope of the appended claims.

Claims (10)

1. a kind of correcting method for searching for text characterized by comprising
Target search text is obtained, and word segmentation processing is carried out to the target search text and determines that the target search text is corresponding Search word sequence;
The corresponding each accurate word sequence of candidate of described search word sequence is determined according to search corpus, wherein described search word order The candidate accurate word in search term and the candidate accurate word sequence in column corresponds;
Described search word sequence is segmented, determines the corresponding each search term segment of each segmented mode and each described search The corresponding each candidate accurate word segment of word segment, wherein the segmented mode includes search term number of fragments and each search The corresponding search term quantity of word segment;
According to the corresponding each search term segment of each segmented mode, each candidate accurate word corresponding with described search word segment Segment, described search corpus and search term sum, determine each accurate word sequence of candidate under each segmented mode Corresponding candidate corrections probability;
According to the corresponding candidate corrections probability of each candidate accurate word sequence under each segmented mode, the target is determined The corresponding accurate word sequence of target of text is searched for, and determines that the target search text is corresponding according to the accurate word sequence of the target The accurate text of target.
2. the method according to claim 1, wherein according to the corresponding each search term piece of each segmented mode Section, each candidate's accurate word segment corresponding with described search word segment, described search corpus, Wilson's confidence interval formula with And search term sum, determine the corresponding candidate corrections probability of each candidate accurate word sequence under each segmented mode, packet It includes:
Each segmented mode is determined as the first segmented mode one by one, and by the candidate each under first segmented mode Accurate word sequence is determined as the first accurate word sequence one by one;
According to described search corpus, determine that the corresponding each segment of first segmented mode corrects probability, wherein the segment Correcting probability is the probability that search term segment is corrected as corresponding first accurate word segment;
Probability, described search word number of fragments and search term sum, which are corrected, according to each segment determines the first segmentation side The corresponding candidate corrections probability of the first accurate word sequence under formula.
3. according to the method described in claim 2, it is characterized in that, determining first segmentation according to described search corpus The corresponding each segment of mode corrects probability, comprising:
The corresponding each described search word segment of first segmented mode is determined as target search word segment one by one;
If the target search word segment only includes a search term, the target search is determined according to described search corpus Word segment corresponding first corrects probability, and correcting determine the probability for described first is corresponding of the target search word segment Section corrects probability, wherein the first correction probability is the probability that the search term is corrected as corresponding first accurate word;
If the target search word segment includes at least two search terms, determine that the target is searched according to described search corpus Rope word segment corresponding each second corrects probability and each third corrects probability, and corrects probability and each institute according to each described second It states third and corrects probability, determine that the corresponding segment of the target search word segment corrects probability, wherein described second corrects probability It is the probability that current search word in the target search word segment is corrected as corresponding current first accurate word, the third is entangled Positive probability is that the probability of next first accurate word occur after current first accurate word.
4. according to the method described in claim 3, it is characterized in that, determining the target search word according to described search corpus Segment corresponding each second corrects probability and each third corrects probability, comprising:
According to described search corpus determine the corresponding historical search number of current search word in the target search word segment, when Preceding search term is corrected as corresponding current first accurate word of the historical correction number of corresponding first accurate word, current search word Second frequency of occurrence of next first accurate word of the first frequency of occurrence and current first accurate word;
According to the historical search number and the historical correction number, determine that each second corrects probability;
According to first frequency of occurrence and second frequency of occurrence, determine that each third corrects probability.
5. according to the method described in claim 3, it is characterized in that, determining the target search word segment pair according to the following formula The segment answered corrects probability:
Wherein,It is target search word segment;It is the corresponding first accurate word piece of target search word segment Section;ni+ 1 is under the first accurate word is corresponding in first search term or the first accurate word segment in target search word segment Mark;ni+1It is the last one first accurate word in the last one search term or the first accurate word segment in target search word segment Corresponding subscript;It is that the corresponding segment of target search word segment corrects probability;p(cj|qj) be J-th second correction probability, i.e., j-th of search term is corrected as the probability of corresponding first accurate word, p in target search word segment (cj+1|cj) it is that+1 third of jth corrects probability, i.e. there is jth+1 after j-th of first accurate words in the first accurate word segment The probability of a first accurate word.
6. according to the method described in claim 2, it is characterized in that, determining the described first accurate word sequence pair according to the following formula The candidate corrections probability answered:
Wherein, p (c1,c2,...,cN|q1,q2,...,qN) it is the corresponding candidate corrections probability of the described first accurate word sequence;c1, c2,...,cNIt is the described first accurate word sequence;q1,q2,...,qNIt is the corresponding search word sequence of the target search text;N It is described search word sum;K is the corresponding search term number of fragments of first segmented mode;niIt is (i-1)-th search term piece The subscript of the last one search term in section;It is i-th of search term segment;It is i-th of search term piece The corresponding first accurate word segment of section;It is that the corresponding segment of i-th of search term segment is corrected generally Rate.
7. the method according to claim 1, wherein according to each described candidate accurate under each segmented mode The corresponding candidate corrections probability of word sequence determines the corresponding accurate word sequence of target of the target search text, comprising:
According to the corresponding candidate corrections probability of each candidate accurate word sequence under each segmented mode, determine each described The corresponding accurate word sequence to be selected of segmented mode and the corresponding correction probability to be selected of each accurate word sequence to be selected;
It is accurate that the corresponding accurate word sequence to be selected of maximum correction probability to be selected is determined as the corresponding target of target search text Word sequence.
8. a kind of correcting device for searching for text characterized by comprising
Word sequence determining module is searched for, carries out word segmentation processing for obtaining target search text, and to the target search text Determine the corresponding search word sequence of the target search text;
Candidate accurate word sequence determining module, for determining that described search word sequence is corresponding each candidate quasi- according to search corpus True word sequence, wherein the candidate accurate word in search term and the candidate accurate word sequence in described search word sequence is one by one It is corresponding;
Search term sequence segment module determines that each segmented mode is corresponding each for being segmented to described search word sequence Search term segment and the corresponding each candidate accurate word segment of each described search word segment, wherein the segmented mode includes searching Rope word number of fragments and the corresponding search term quantity of each search term segment;
Candidate corrections probability determination module, for being searched according to the corresponding each search term segment of each segmented mode, with described The corresponding each candidate accurate word segment of rope word segment, described search corpus and search term sum, determine each segmentation The corresponding candidate corrections probability of each candidate accurate word sequence under mode;
The accurate text determining module of target, for corresponding according to the candidate accurate word sequence each under each segmented mode Candidate corrections probability determines the corresponding accurate word sequence of target of the target search text, and according to the accurate word order of the target Column determine the corresponding accurate text of target of the target search text.
9. a kind of terminal, which is characterized in that the terminal includes:
One or more processors;
Memory, for storing one or more programs;
When one or more of programs are executed by one or more of processors, so that one or more of processors are real The now correcting method of the search text as described in any in claim 1-7.
10. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the program is by processor The correcting method of the search text as described in any in claim 1-7 is realized when execution.
CN201810941106.XA 2018-08-17 2018-08-17 Method, device, terminal and storage medium for correcting search text Active CN110019684B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810941106.XA CN110019684B (en) 2018-08-17 2018-08-17 Method, device, terminal and storage medium for correcting search text

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810941106.XA CN110019684B (en) 2018-08-17 2018-08-17 Method, device, terminal and storage medium for correcting search text

Publications (2)

Publication Number Publication Date
CN110019684A true CN110019684A (en) 2019-07-16
CN110019684B CN110019684B (en) 2021-06-15

Family

ID=67188380

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810941106.XA Active CN110019684B (en) 2018-08-17 2018-08-17 Method, device, terminal and storage medium for correcting search text

Country Status (1)

Country Link
CN (1) CN110019684B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160154861A1 (en) * 2014-12-01 2016-06-02 Facebook, Inc. Social-Based Spelling Correction for Online Social Networks
CN105975625A (en) * 2016-05-26 2016-09-28 同方知网数字出版技术股份有限公司 Chinglish inquiring correcting method and system oriented to English search engine
CN106339404A (en) * 2016-06-30 2017-01-18 北京奇艺世纪科技有限公司 Search word recognition method and device
CN106598939A (en) * 2016-10-21 2017-04-26 北京三快在线科技有限公司 Method and device for text error correction, server and storage medium
CN106777073A (en) * 2016-12-13 2017-05-31 深圳爱拼信息科技有限公司 The automatic method for correcting of wrong word and server in a kind of search engine
CN107341181A (en) * 2017-05-27 2017-11-10 武汉斗鱼网络科技有限公司 Method, apparatus, computer-readable recording medium and computer equipment are recommended in search
CN107609098A (en) * 2017-09-11 2018-01-19 北京金堤科技有限公司 Searching method and device

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160154861A1 (en) * 2014-12-01 2016-06-02 Facebook, Inc. Social-Based Spelling Correction for Online Social Networks
CN105975625A (en) * 2016-05-26 2016-09-28 同方知网数字出版技术股份有限公司 Chinglish inquiring correcting method and system oriented to English search engine
CN106339404A (en) * 2016-06-30 2017-01-18 北京奇艺世纪科技有限公司 Search word recognition method and device
CN106598939A (en) * 2016-10-21 2017-04-26 北京三快在线科技有限公司 Method and device for text error correction, server and storage medium
CN106777073A (en) * 2016-12-13 2017-05-31 深圳爱拼信息科技有限公司 The automatic method for correcting of wrong word and server in a kind of search engine
CN107341181A (en) * 2017-05-27 2017-11-10 武汉斗鱼网络科技有限公司 Method, apparatus, computer-readable recording medium and computer equipment are recommended in search
CN107609098A (en) * 2017-09-11 2018-01-19 北京金堤科技有限公司 Searching method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
胡熠,等: "搜索引擎的一种在线中文查询纠错方法", 《中文信息学报》 *

Also Published As

Publication number Publication date
CN110019684B (en) 2021-06-15

Similar Documents

Publication Publication Date Title
WO2020182122A1 (en) Text matching model generation method and device
CN107992585B (en) Universal label mining method, device, server and medium
US10430405B2 (en) Apply corrections to an ingested corpus
CN111159546B (en) Event pushing method, event pushing device, computer readable storage medium and computer equipment
CN108256044B (en) Live broadcast room recommendation method and device and electronic equipment
CN107526846B (en) Method, device, server and medium for generating and sorting channel sorting model
EP3627498B1 (en) Method and system, for generating speech recognition training data
CN108681541B (en) Picture searching method and device and computer equipment
US20180107953A1 (en) Content delivery method, apparatus, and storage medium
KR20190000776A (en) Information inputting method
US20190095447A1 (en) Method, apparatus, device and storage medium for establishing error correction model based on error correction platform
CN108595679A (en) A kind of label determines method, apparatus, terminal and storage medium
US11630825B2 (en) Method and system for enhanced search term suggestion
KR101446468B1 (en) System and method for prividing automatically completed query
CN110164416B (en) Voice recognition method and device, equipment and storage medium thereof
CN111435406A (en) Method and device for correcting database statement spelling errors
CN111309872B (en) Search processing method, device and equipment
CN107885875B (en) Synonymy transformation method and device for search words and server
CN111832264A (en) PDF file based signature position determination method, device and equipment
US20160171900A1 (en) Determining the Correct Answer in a Forum Thread
CN109545223B (en) Voice recognition method applied to user terminal and terminal equipment
CN112395880B (en) Error correction method and device for structured triples, computer equipment and storage medium
CN111554295B (en) Text error correction method, related device and readable storage medium
CN102955770A (en) Method and system for automatic recognition of pinyin
WO2016101737A1 (en) Search query method and apparatus

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20240410

Address after: Room 315, No.1 Yichuang Street, Huangpu District, Guangzhou City, Guangdong Province, 510000

Patentee after: Working at Bide Digital Technology (Guangzhou) Co.,Ltd.

Country or region after: China

Address before: 11 / F, building B1, phase 4.1, software industry, No.1, Software Park East Road, Wuhan East Lake Development Zone, Wuhan City, Hubei Province, 430070

Patentee before: WUHAN DOUYU NETWORK TECHNOLOGY Co.,Ltd.

Country or region before: China