Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
Example one
Fig. 1 is a flowchart of a method for correcting a search text according to an embodiment of the present invention, which is applicable to spell correction of a search text, and especially applicable to a live webcast platform for correcting a scene of the search text. The method can be executed by a text search correction device, which can be implemented by software and/or hardware, and is integrated in a terminal with a search function, such as a smart phone, a tablet computer, a desktop computer, and the like. The method specifically comprises the following steps:
s110, obtaining a target search text, and performing word segmentation processing on the target search text to determine a search word sequence corresponding to the target search text.
The target search text refers to the search text currently input by the user. For example, the search text in the current search entry may be determined as the target search text. The word segmentation process may refer to dividing the target search text into a plurality of search words according to a word segmentation dictionary or other word segmentation rules. The search word sequence refers to a sequence formed by each search word obtained by performing word segmentation processing on a target search text. The search word arrangement order in the search word sequence is consistent with the search word order in the target search text. For example, if the target search text is "not surprised", the determined search word sequence after performing word matching according to the word segmentation dictionary may be: "not, surprise".
And S120, determining each candidate accurate word sequence corresponding to the search word sequence according to the search corpus, wherein the search words in the search word sequence correspond to the candidate accurate words in the candidate accurate word sequence one by one.
Wherein the search corpus can be predetermined from search behavior logs of a large number of users. The search corpus may include a plurality of historical search keywords and an accurate keyword corrected for each historical search keyword, where the accurate keyword may be determined according to a click operation of a user. The candidate accurate word sequence refers to any possible correction sequence corresponding to the target search text. The embodiment can determine the optimal accurate word sequence from all candidate accurate word sequences. In thatIn this embodiment, each candidate accurate word sequence corresponds to one candidate accurate word text, where an order of the candidate accurate words in the candidate accurate word text is consistent with the candidate accurate word sequence. The search words in the search word sequence correspond to the accurate candidate words in the accurate candidate word sequence one by one, that is, each search word in the search word sequence corresponds to one accurate candidate word in the accurate candidate word sequence. Illustratively, the target search word text corresponds to a search word sequence of q1,q2,...,qNA candidate accurate word sequence is c1,c2,...,cNWherein the search term qiAnd candidate accurate word ciAnd correspond to each other.
Specifically, in this embodiment, at least one candidate accurate word corresponding to each search word in the search word sequence may be determined according to the search corpus, and the candidate accurate words are arranged and combined to determine each candidate accurate word sequence corresponding to the search word sequence. Illustratively, if the sequence of the search word is "no, surprise", the candidate accurate word corresponding to the search word "no" is determined according to the search corpus as follows: "not", "step" and "part", the search term "surprise" corresponds to the exact word candidate: "surprise" and "elaboration", then the exact word sequence of each candidate determined is: "don't care, elaborate", "step, surprise heart", "step, elaborate", "part, surprise heart" and "part, elaborate". It should be noted that the candidate accurate word may also be the search word itself, because a certain search word in the target search text may also be an accurate spelling search word.
S130, segmenting the search word sequence, and determining each search word segment corresponding to each segmentation mode and each candidate accurate word segment corresponding to each search word segment.
Wherein the search term segment includes at least one search term. The search word segment may be a part of the search word sequence or the whole search word sequence, and the specific situation is determined by a segmentation mode. The segmentation manner may include the number of search term segments and the number of search terms corresponding to each search term segment. The value range of the number of the search word segments in the segmentation mode is as follows: the total number of the search words corresponding to the search word sequence is more than or equal to 1 and less than or equal to. The segmentation mode in this embodiment is used to characterize that the search word sequence is divided into several search word segments and each search word segment includes several search words. The number of the segmentation modes can be determined according to the total number of the search words corresponding to the search word sequence. For each candidate accurate word sequence, each candidate accurate word segment in the candidate accurate word sequence corresponds to a search word segment one by one, the number of the candidate accurate word segments is the same as that of the search word segments, and the number of the candidate accurate words of each candidate accurate word segment is also the same as that of the search words of the corresponding search word segment.
Illustratively, assume that the search word sequence is q1,q2,q3A candidate accurate word sequence is c1,c2,c3There are four segmentation modes, the first segmentation mode is: dividing the search word sequence into only one search word segment, i.e. q1,q2,q3When the corresponding candidate accurate word segment is c1,c2,c3(ii) a The second segmentation mode is as follows: dividing the search word sequence into two search word segments, the first search word segment including two search words and the second search word segment including one search word, i.e. q1,q2And q is3At this time, the corresponding candidate accurate word segments are respectively c1,c2And c3(ii) a The third segmentation mode is as follows: dividing the search word sequence into two search word segments, the first search word segment including a search word and the second search word segment including two search words, q1And q is2,q3At this time, the corresponding candidate accurate word segments are respectively c1And c2,c3(ii) a The fourth segmentation mode is as follows: dividing the search word sequence into three search word segments, each search word being a search word segment, namely q1、q2And q is3At this time, the corresponding candidate accurate word segments are respectively c1、c2And c3。
Specifically, each segmentation mode corresponds to at least one search term segment, and each search term segment corresponds to one accurate candidate term segment in each accurate candidate term sequence.
S140, determining candidate correction probability corresponding to each candidate accurate word sequence in each segmentation mode according to each search word segment corresponding to each segmentation mode, each candidate accurate word segment corresponding to the search word segment, a search corpus and the total number of search words.
The total number of search terms refers to the number of search terms that constitute the search term sequence. The candidate correction probability corresponding to the candidate accurate word sequence refers to the correction probability of the search word sequence corrected to the candidate accurate word sequence.
Specifically, each segmentation mode of the search word sequence is different, so that the candidate correction probabilities corresponding to the candidate accurate word sequences calculated in each segmentation mode are also different. For a certain candidate accurate word sequence in a certain segmentation mode, determining candidate correction probability corresponding to the candidate accurate word sequence according to the search corpus, each search word segment corresponding to the segmentation mode, each candidate accurate word segment corresponding to each search word segment in the candidate accurate word sequence and the total number of search words. In the embodiment, the candidate correction probability corresponding to each candidate accurate word sequence in each segmentation mode is calculated, so that the candidate correction probability corresponding to all cases in which the context of at least one candidate accurate word is considered or the contexts of all candidate accurate words are not considered can be obtained, the problem of low correction accuracy caused by low relevance of search words or less historical data in a search corpus is solved, and the correction accuracy is improved.
Illustratively, if the segmentation mode is: all the search words are used as a search word segment, that is, the search word sequence only includes one search word segment, that is, the search word sequence corresponds to a non-segmented condition, and the candidate correction probability corresponding to each candidate accurate word sequence determined by the segmentation mode needs to consider the context relationship of each candidate accurate word. If the segmentation mode is as follows: each search word is taken as a search word segment, that is, under the condition that the number of the search word segments is the largest, the context relation of the candidate accurate words cannot be considered by the candidate correction probability corresponding to each candidate accurate word sequence determined by the segmentation mode.
S150, determining a target accurate word sequence corresponding to the target search text according to the candidate correction probability corresponding to each candidate accurate word sequence in each segmentation mode, and determining a target accurate text corresponding to the target search text according to the target accurate word sequence.
The target accurate word sequence may refer to an optimal candidate accurate word sequence corresponding to the optimal segmentation mode. For each segmentation mode, a candidate correction probability corresponding to each candidate accurate word sequence can be obtained. In other words, for each candidate accurate word sequence, the candidate correction probability of the candidate accurate word sequence in each segmentation mode can also be obtained. In the present embodiment, the following two ways for determining the target accurate word sequence corresponding to the target search text may be included, but not limited to.
Optionally, determining a target accurate word sequence corresponding to the target search text according to the candidate correction probability corresponding to each candidate accurate word sequence in each segmentation mode, including: determining an accurate word sequence to be selected corresponding to each segmentation mode and a correction probability to be selected corresponding to each accurate word sequence to be selected according to the correction probability to be selected corresponding to each accurate word sequence to be selected in each segmentation mode; and determining the accurate word sequence to be selected corresponding to the maximum correction probability to be selected as the accurate target word sequence corresponding to the target search text. Specifically, for a certain segmentation mode, candidate correction probabilities corresponding to each candidate accurate word sequence in the segmentation mode are compared, the candidate accurate word sequence corresponding to the maximum candidate correction probability determines an accurate word sequence to be selected corresponding to the segmentation mode, the maximum candidate correction probability is determined as an accurate word sequence to be selected corresponding to the accurate word sequence to be selected, so that the accurate word sequence to be selected corresponding to each segmentation mode can be obtained, then the accurate word sequence to be selected corresponding to each accurate word sequence to be selected is compared, the accurate word sequence to be selected corresponding to the maximum accurate word sequence to be selected is determined as a target accurate word sequence corresponding to a target search text, and therefore more accurate word sequences can be determined.
Optionally, determining a target accurate word sequence corresponding to the target search text according to the candidate correction probability corresponding to each candidate accurate word sequence in each segmentation mode, including: determining a target correction probability corresponding to each candidate accurate word sequence according to the candidate correction probability corresponding to each candidate accurate word sequence in each segmentation mode; and determining the candidate accurate word sequence corresponding to the maximum target correction probability as a target accurate word sequence corresponding to the target search text. Specifically, for a certain candidate accurate word sequence, the candidate correction probabilities determined by the candidate accurate word sequence in each segmentation mode are compared, the maximum candidate correction probability is determined as the target correction probability corresponding to the candidate accurate word sequence, so that the target correction probability corresponding to each candidate accurate word sequence can be obtained, then the target correction probabilities corresponding to each candidate accurate word sequence are compared, the candidate accurate word sequence corresponding to the maximum target correction probability is determined as the target accurate word sequence corresponding to the target search text, and therefore a more accurate word sequence can be determined.
After the target accurate word sequence is determined, the accurate words in the target accurate word sequence can be directly spliced according to the sequence of the target accurate word sequence, and the sequence of the accurate words in the target accurate word sequence is ensured to be consistent with the sequence of the accurate words in the determined target accurate text, so that the target accurate text corresponding to the target search text, namely the optimal accurate text, can be obtained. According to the method and the device, the target search text can be corrected to be the target accurate text, so that the accurate search result required by the user can be obtained by automatically searching according to the target accurate text, the search accuracy is improved, the user does not need to manually input the accurate search text again, and the search experience of the user is improved.
The method comprises the steps of segmenting a search word sequence, and determining each search word segment corresponding to each segmentation mode and each candidate accurate word segment corresponding to each search word segment; determining candidate correction probability corresponding to each candidate accurate word sequence in each segmentation mode according to each search word segment corresponding to each segmentation mode, each candidate accurate word segment corresponding to the search word segment, a search corpus and the total number of search words; and determining a target accurate word sequence corresponding to the target search text according to the candidate correction probability corresponding to each candidate accurate word sequence in each segmentation mode, and determining a target accurate text corresponding to the target search text according to the target accurate word sequence. By carrying out segmentation error correction on the search word sequence and considering the relevance of the search word in each segmentation mode, the optimal candidate accurate word sequence corresponding to the optimal segmentation mode can be determined, the condition that the calculated target accurate word sequence is inaccurate due to low relevance of the search word is avoided, the correction accuracy is improved, and the search experience of a user is further improved.
Example two
Fig. 2 is a flowchart of a correction method for a search text according to a second embodiment of the present invention, where this embodiment further optimizes "determining candidate correction probabilities corresponding to each candidate accurate word sequence in each segmentation mode according to each search word segment corresponding to each segmentation mode, each candidate accurate word segment corresponding to the search word segment, a search corpus, and a total number of search words" on the basis of the first embodiment of the present invention. Wherein explanations of the same or corresponding terms as those in the above embodiment are omitted.
Referring to fig. 2, the method for correcting a search text provided in this embodiment specifically includes the following steps:
s210, obtaining a target search text, and performing word segmentation processing on the target search text to determine a search word sequence corresponding to the target search text.
S220, determining each candidate accurate word sequence corresponding to the search word sequence according to the search corpus, wherein the search words in the search word sequence correspond to the candidate accurate words in the candidate accurate word sequence one by one.
S230, segmenting the search word sequence, and determining each search word segment corresponding to each segmentation mode and each candidate accurate word segment corresponding to each search word segment.
S240, determining each segmentation mode as a first segmentation mode one by one, and determining each candidate accurate word sequence under the first segmentation mode as a first accurate word sequence one by one.
Specifically, in this embodiment, each segmentation mode is determined as a first segmentation mode one by one, each accurate candidate word sequence in the first segmentation mode is determined as a first accurate candidate word sequence one by one, each accurate candidate word segment in the accurate candidate word sequences is determined as each first accurate candidate word segment in the first accurate candidate word sequence, and each accurate candidate word in the accurate candidate word segments is determined as each first accurate candidate word in the first accurate candidate word segment, so that the candidate correction probability corresponding to each accurate candidate word sequence in each segmentation mode is determined one by one according to the same determination process.
And S250, determining the correction probability of each segment corresponding to the first segmentation mode according to the search corpus.
Wherein the first segmentation mode corresponds to at least one search term segment. The segment correction probability is the probability that a search word segment is corrected to the corresponding first exact word segment. According to the embodiment, the probability that each search word segment is corrected to be the corresponding first accurate word segment, that is, the segment correction probability corresponding to each search word segment, can be determined according to the search corpus and the number of search words in the search word segment.
Optionally, S250 includes: determining the search word segments corresponding to the first segmentation mode one by one as target search word segments; if the target search word segment only comprises one search word, determining a first correction probability corresponding to the target search word segment according to the search corpus, and determining the first correction probability as a segment correction probability corresponding to the target search word segment; and if the target search word segment comprises at least two search words, determining second correction probabilities and third correction probabilities corresponding to the target search word segment according to the search corpus, and determining segment correction probabilities corresponding to the target search word segment according to the second correction probabilities and the third correction probabilities.
And the second correction probability is the probability that the current search word in the target search word segment is corrected to be the corresponding current first accurate word. The third correction probability is the probability that the next first exact word occurs after the current first exact word. And when the target search word segment comprises at least two search words, determining each search word in the target search word segment as the current search word one by one. The current first exact word refers to the first exact word corresponding to the current search word in the first exact word segment. The latter first exact word refers to the first exact word that is located next and adjacent to the current first exact word in the order of the sequence of the first exact words.
Specifically, each search term segment corresponding to the first segmentation mode is determined as a target search term segment one by one, so that the segment correction probability corresponding to each search term segment can be determined one by one through the same process. And if the target search word segment only comprises one search word, namely the corresponding first accurate word segment also only comprises one first accurate word, directly determining the first correction probability as the segment correction probability corresponding to the target search word segment. And if the target search word segment comprises at least two search words, namely the corresponding first accurate word segment also comprises at least two first accurate words, determining the segment correction probability corresponding to the target search word segment according to each second correction probability and each third correction probability. It should be noted that when the target search term segment includes only one search term, the context of the first exact term corresponding to the search term does not need to be considered, and when the target search term segment includes at least two search terms, the context of each first exact term in the first exact term segment, that is, the third correction probability, needs to be considered.
Illustratively, if the target search term segment includes only one search term q1The corresponding first exact word segment also comprises only one first exact word c1Then determine the search word q from the search corpus1Corrected to the corresponding first exact word c1Has a first correction probability of p (c)1|q1) At this time, p (c) may be substituted1|q1) And directly determining the segment correction probability corresponding to the target search word segment. If the target search word segment comprises two search words q1And q is2The corresponding first exact word segment also includes two first exact words c1And c2Then, determining the probability that each search word is corrected to be the corresponding first accurate word according to the search corpus, that is, two second correction probabilities are: p (c)1|q1) And p (c)2|q2) And the probability of the occurrence of the next first exact word after the current first exact word, i.e. a third correction probability is: p (c)2|c1) At this time according to p (c)1|q1)、p(c2|q2) And p (c)2|c1) And determining the segment correction probability corresponding to the target search word segment.
Optionally, determining, according to the search corpus, second correction probabilities and third correction probabilities corresponding to the target search word segment, includes: determining historical search times corresponding to a current search word in a target search word segment, historical correction times of the current search word corrected to a corresponding first accurate word, first occurrence times of the current first accurate word corresponding to the current search word and second occurrence times of a next first accurate word of the current first accurate word according to a search corpus; determining each second correction probability according to the historical search times and the historical correction times; and determining each third correction probability according to the first occurrence number and the second occurrence number.
The historical search times corresponding to the current search word refer to the times of occurrence of the current search word in the historical search data of the search corpus. The historical correction times of the current search word corrected to the corresponding first accurate word refer to the times that the current search word is not accurately corrected to the first accurate word in the historical search data of the search corpus. The first occurrence frequency of the current first accurate word corresponding to the current search word refers to the occurrence frequency of the current first accurate word in the search corpus. The second occurrence frequency of the first accurate word next to the current first accurate word refers to the occurrence frequency of the first accurate word next to the current first accurate word in the search corpus.
Specifically, when the target search term segment includes at least two search terms, the embodiment may determine a ratio of the historical correction times to the historical search times as a second correction probability corresponding to the current search term, and determine a ratio of the second occurrence times to the first occurrence times as a third correction probability.
In this embodiment, when the target search term segment includes only one search term, the historical correction times of the search term in the search corpus may be divided by the historical search times of the search term, and the obtained operation result is determined as the first correction probability, that is, the probability that the search term is corrected to the corresponding first accurate term.
Optionally, in this embodiment, it may be assumed that the segment correction probability of the search term segment corrected to the corresponding first accurate term segment satisfies the hidden markov process, so that the segment correction probability may be determined more conveniently. Exemplary, segment correction probabilities
The calculation formula of (c) can be simplified as follows:
wherein the content of the first and second substances,
is a target search term segment;
the first accurate word segment is corresponding to the target search word segment; n is
i+1 is a subscript corresponding to a first search word in the target search word segment or a first exact word in the first exact word segment; n is
i+1The subscript corresponding to the last search word in the target search word segment or the last first accurate word in the first accurate word segment;
is the segment correction probability corresponding to the target search term segment; p (c)
j|q
j) Is the jth second correction probability, i.e. the probability that the jth search word in the target search word segment is corrected to the corresponding first exact word, p (c)
j+1|c
j) Is the j +1 third correction probability, that is, the probability that the j +1 th first exact word appears after the j th first exact word in the first exact word segment.
Specifically, if the target search term segment includes at least two search terms, each second correction probability and each third correction probability may be multiplied, and the multiplication result is determined as the segment correction probability corresponding to the target search term segment.
S260, determining candidate correction probability corresponding to the first accurate word sequence in the first segmentation mode according to the correction probability of each segment, the number of the search word segments and the total number of the search words.
Optionally, the candidate correction probability corresponding to the first exact word sequence is determined according to the following formula:
s.t.0=n1<n2<...<nk=N
wherein, p (c)
1,c
2,...,c
N|q
1,q
2,...,q
N) Is the candidate correction probability corresponding to the first exact word sequence; c. C
1,c
2,...,c
NIs a first sequence of exact words; q. q.s
1,q
2,...,q
NIs a search word sequence corresponding to the target search text; n is the total number of search terms; k is the number of the search term segments corresponding to the first segmentation mode; n is
iIs the subscript of the last search term in the i-1 th search term segment;
is the ith search term segment;
is the first accurate word segment corresponding to the ith search word segment;
is the segment correction probability corresponding to the ith search term segment.
Illustratively, assuming that the total number of search words is 3, i.e., N equals 3, the first exact word sequence is c1,c2,c3The search word sequence corresponding to the target search text is q1,q2,q3The first segmentation mode is as follows: dividing the search word sequence into two search word segments, wherein the first search word segment comprises two search words, the second search word segment comprises one search word, and the first segmentation mode corresponds to the two search word segments respectively in q1,q2And q is3The corresponding first accurate word segments are respectively c1,c2And c3The number k of the search word segments corresponding to the first segmentation mode is 2, and the subscript n of the last search word in the first search word segment22, subscript n of the last search term in the second search term segment3When 3, the first exact word sequence c1,c2,c3The corresponding candidate correction probabilities are:
in this embodiment, by repeating steps S240 to S260, the candidate correction probability corresponding to each candidate accurate word sequence in each segmentation mode can be determined.
S270, determining a target accurate word sequence corresponding to the target search text according to the candidate correction probability corresponding to each candidate accurate word sequence in each segmentation mode, and determining a target accurate text corresponding to the target search text according to the target accurate word sequence.
According to the technical scheme of the embodiment, the candidate correction probability corresponding to each candidate accurate word sequence in each segmentation mode can be accurately determined according to the segment correction probability corresponding to each search word segment in each segmentation mode, so that the corrected accurate word sequence can be more accurately determined, and the correction accuracy is improved.
The following is an embodiment of a device for correcting a search text according to an embodiment of the present invention, which belongs to the same inventive concept as a method for correcting a search text according to the above embodiments, and reference may be made to the above embodiment of the method for correcting a search text for details that are not described in detail in the embodiment of the device for correcting a search text.
EXAMPLE III
Fig. 3 is a schematic structural diagram of a correction apparatus for a search text according to a third embodiment of the present invention, where the present embodiment is applicable to a case of performing spelling correction on a search text, and the apparatus specifically includes: a search word sequence determination module 310, a candidate exact word sequence determination module 320, a search word sequence segmentation module 330, a candidate correction probability determination module 340, and a target exact text determination module 350.
The search word sequence determining module 310 is configured to obtain a target search text, perform word segmentation processing on the target search text, and determine a search word sequence corresponding to the target search text; a candidate accurate word sequence determining module 320, configured to determine, according to the search corpus, each candidate accurate word sequence corresponding to the search word sequence, where the search words in the search word sequence correspond to the candidate accurate words in the candidate accurate word sequence one to one; a search word sequence segmenting module 330, configured to segment the search word sequence, and determine each search word segment corresponding to each segmentation mode and each candidate accurate word segment corresponding to each search word segment, where the segmentation mode includes the number of search word segments and the number of search words corresponding to each search word segment; a candidate correction probability determining module 340, configured to determine candidate correction probabilities corresponding to the candidate accurate word sequences in each segmentation mode according to each search word segment corresponding to each segmentation mode, each candidate accurate word segment corresponding to the search word segment, the search corpus, and the total number of search words; and the target accurate text determining module 350 is configured to determine a target accurate word sequence corresponding to the target search text according to the candidate correction probability corresponding to each candidate accurate word sequence in each segmentation mode, and determine a target accurate text corresponding to the target search text according to the target accurate word sequence.
According to the embodiment of the invention, the search word sequence is subjected to segmented error correction, and the relevance of the search word in each segmentation mode is considered, so that the optimal candidate accurate word sequence corresponding to the optimal segmentation mode can be determined, the condition that the calculated target accurate word sequence is inaccurate due to low relevance of the search word is avoided, the correction accuracy is improved, and the search experience of a user is further improved.
Optionally, the candidate correction probability determining module 340 includes:
the first accurate word sequence determining unit is used for determining each segmentation mode as a first segmentation mode one by one and determining each candidate accurate word sequence under the first segmentation mode as a first accurate word sequence one by one;
the segment correction probability determining unit is used for determining each segment correction probability corresponding to the first segmentation mode according to the search corpus, wherein the segment correction probability is the probability that the search word segment is corrected into the corresponding first accurate word segment;
and the candidate correction probability determining unit is used for determining the candidate correction probability corresponding to the first accurate word sequence in the first segmentation mode according to the correction probability of each segment, the number of the search word segments and the total number of the search words.
Optionally, the segment correction probability determining unit includes:
the target search word segment determining subunit is used for determining the search word segments corresponding to the first segmentation mode one by one as target search word segments;
a first segment correction probability determining subunit, configured to determine, according to the search corpus, a first correction probability corresponding to the search word in the target search word segment if the target search word segment only includes one search word, and determine the first correction probability as a segment correction probability corresponding to the target search word segment, where the first correction probability is a probability that the search word is corrected to a corresponding first accurate word;
and the second segment correction probability determining subunit is configured to determine, according to the search corpus, second correction probabilities and third correction probabilities corresponding to the target search word segments if the target search word segments include at least two search words, and determine, according to the second correction probabilities and the third correction probabilities, segment correction probabilities corresponding to the target search word segments, where the second correction probability is a probability that a current search word in the target search word segments is corrected to a corresponding current first exact word, and the third correction probability is a probability that a next first exact word appears after the current first exact word.
Optionally, the second segment correction probability determining subunit is further configured to: determining historical search times corresponding to a current search word in a target search word segment, historical correction times of the current search word corrected to a corresponding first accurate word, first occurrence times of the current first accurate word corresponding to the current search word and second occurrence times of a next first accurate word of the current first accurate word according to a search corpus; determining each second correction probability according to the historical search times and the historical correction times; and determining each third correction probability according to the first occurrence number and the second occurrence number.
Optionally, the segment correction probability corresponding to the target search term segment is determined according to the following formula:
wherein the content of the first and second substances,
is a target search term segment;
the first accurate word segment is corresponding to the target search word segment; n is
i+1 is a subscript corresponding to a first search word in the target search word segment or a first exact word in the first exact word segment; n is
i+1The subscript corresponding to the last search word in the target search word segment or the last first accurate word in the first accurate word segment;
is the segment correction probability corresponding to the target search term segment; p (c)
j|q
j) Is the jth second correction probability, i.e. the jth search word in the target search word segment is corrected to be a pairProbability of the corresponding first exact word, p (c)
j+1|c
j) Is the (j + 1) th third correction probability, i.e. the probability that the (j + 1) th first exact word appears after the (j) th first exact word in the first exact word segment.
Optionally, the candidate correction probability corresponding to the first exact word sequence is determined according to the following formula:
s.t.0=n1<n2<...<nk=N
wherein, p (c)
1,c
2,...,c
N|q
1,q
2,...,q
N) Is the candidate correction probability corresponding to the first exact word sequence; c. C
1,c
2,...,c
NIs a first sequence of exact words; q. q.s
1,q
2,...,q
NIs a search word sequence corresponding to the target search text; n is the total number of search terms; k is the number of the search term segments corresponding to the first segmentation mode; n is
iIs the subscript of the last search term in the i-1 th search term segment;
is the ith search term segment;
is the first accurate word segment corresponding to the ith search word segment;
is the segment correction probability corresponding to the ith search term segment.
Optionally, the target accurate text determining module 350 is further configured to: determining an accurate word sequence to be selected corresponding to each segmentation mode and a correction probability to be selected corresponding to each accurate word sequence to be selected according to the correction probability to be selected corresponding to each accurate word sequence to be selected in each segmentation mode; and determining the accurate word sequence to be selected corresponding to the maximum correction probability to be selected as the accurate target word sequence corresponding to the target search text.
The correction device for the search text provided by the embodiment of the invention can execute the correction method for the search text provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of executing the correction method for the search text.
It should be noted that, in the embodiment of the correction apparatus for searching for a text, the units and modules included in the embodiment are only divided according to functional logic, but are not limited to the above division as long as the corresponding functions can be implemented; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention.
Example four
Fig. 4 is a schematic structural diagram of a terminal according to a fourth embodiment of the present invention. Referring to fig. 4, the terminal includes:
one or more processors 410;
a memory 420 for storing one or more programs;
when executed by one or more processors 410, cause the one or more processors 410 to implement a method for correcting search text as set forth in any one of the above embodiments, the method comprising:
acquiring a target search text, and performing word segmentation processing on the target search text to determine a search word sequence corresponding to the target search text;
determining each candidate accurate word sequence corresponding to the search word sequence according to a search corpus, wherein search words in the search word sequence correspond to candidate accurate words in the candidate accurate word sequence one to one;
segmenting the search word sequence, and determining each search word segment corresponding to each segmentation mode and each candidate accurate word segment corresponding to each search word segment, wherein the segmentation mode comprises the number of the search word segments and the number of the search words corresponding to each search word segment;
determining candidate correction probability corresponding to each candidate accurate word sequence in each segmentation mode according to each search word segment corresponding to each segmentation mode, each candidate accurate word segment corresponding to the search word segment, the search corpus and the total number of search words;
and determining a target accurate word sequence corresponding to the target search text according to the candidate correction probability corresponding to each candidate accurate word sequence in each segmentation mode, and determining a target accurate text corresponding to the target search text according to the target accurate word sequence.
In FIG. 4, a processor 410 is illustrated as an example; the processor 410 and the memory 420 in the terminal may be connected by a bus or other means, as exemplified by the bus connection in fig. 4.
The memory 420 serves as a computer-readable storage medium, and may be used to store software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to the correction method of the search text in the embodiment of the present invention (for example, the search word sequence determination module 310, the candidate accurate word sequence determination module 320, the search word sequence segmentation module 330, the candidate correction probability determination module 340, and the target accurate text determination module 350 in the correction apparatus of the search text). The processor 410 executes various functional applications of the terminal and data processing, i.e., implements the above-described correction method of the search text, by executing software programs, instructions, and modules stored in the memory 420.
The memory 420 mainly includes a program storage area and a data storage area, wherein the program storage area can store an operating system and an application program required by at least one function; the storage data area may store data created according to the use of the terminal, and the like. Further, the memory 420 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, the memory 420 may further include memory located remotely from the processor 410, which may be connected to the terminal over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The terminal proposed by the present embodiment belongs to the same inventive concept as the correction method for the search text proposed by the above embodiment, and the technical details not described in detail in the present embodiment can be referred to the above embodiment, and the present embodiment has the same beneficial effects as the correction method for executing the search text.
EXAMPLE five
This fifth embodiment provides a computer-readable storage medium, on which a computer program is stored, the program, when executed by a processor, implementing a method for correcting a search text according to any embodiment of the present invention, the method including:
acquiring a target search text, and performing word segmentation processing on the target search text to determine a search word sequence corresponding to the target search text;
determining each candidate accurate word sequence corresponding to the search word sequence according to a search corpus, wherein search words in the search word sequence correspond to candidate accurate words in the candidate accurate word sequence one to one;
segmenting the search word sequence, and determining each search word segment corresponding to each segmentation mode and each candidate accurate word segment corresponding to each search word segment, wherein the segmentation mode comprises the number of the search word segments and the number of the search words corresponding to each search word segment;
determining candidate correction probability corresponding to each candidate accurate word sequence in each segmentation mode according to each search word segment corresponding to each segmentation mode, each candidate accurate word segment corresponding to the search word segment, the search corpus and the total number of search words;
and determining a target accurate word sequence corresponding to the target search text according to the candidate correction probability corresponding to each candidate accurate word sequence in each segmentation mode, and determining a target accurate text corresponding to the target search text according to the target accurate word sequence.
Computer storage media for embodiments of the invention may employ any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. The computer-readable storage medium may be, for example but not limited to: an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
It will be understood by those skilled in the art that the modules or steps of the invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of computing devices, and optionally they may be implemented by program code executable by a computing device, such that it may be stored in a memory device and executed by a computing device, or it may be separately fabricated into various integrated circuit modules, or it may be fabricated by fabricating a plurality of modules or steps thereof into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.