CN110019684B - Method, device, terminal and storage medium for correcting search text - Google Patents

Method, device, terminal and storage medium for correcting search text Download PDF

Info

Publication number
CN110019684B
CN110019684B CN201810941106.XA CN201810941106A CN110019684B CN 110019684 B CN110019684 B CN 110019684B CN 201810941106 A CN201810941106 A CN 201810941106A CN 110019684 B CN110019684 B CN 110019684B
Authority
CN
China
Prior art keywords
search
word
segment
accurate
candidate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810941106.XA
Other languages
Chinese (zh)
Other versions
CN110019684A (en
Inventor
王璐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Working at Bide Digital Technology (Guangzhou) Co.,Ltd.
Original Assignee
Wuhan Douyu Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Douyu Network Technology Co Ltd filed Critical Wuhan Douyu Network Technology Co Ltd
Priority to CN201810941106.XA priority Critical patent/CN110019684B/en
Publication of CN110019684A publication Critical patent/CN110019684A/en
Application granted granted Critical
Publication of CN110019684B publication Critical patent/CN110019684B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention discloses a method, a device, a terminal and a storage medium for correcting a search text, wherein the method comprises the following steps: acquiring a target search text, and performing word segmentation processing on the target search text to determine a search word sequence corresponding to the target search text; determining each candidate accurate word sequence corresponding to the search word sequence according to the search corpus; segmenting the search word sequence, and determining each search word segment corresponding to each segmentation mode and each candidate accurate word segment corresponding to each search word segment; determining candidate correction probability corresponding to each candidate accurate word sequence in each segmentation mode according to the search corpus and the total number of the search words; and determining a target accurate text corresponding to the target search text according to the candidate correction probability corresponding to each candidate accurate word sequence in each segmentation mode. According to the technical scheme of the embodiment of the invention, the problem of low correction accuracy caused by poor relevance of the search terms in the prior art can be solved by utilizing a segmented error correction mode.

Description

Method, device, terminal and storage medium for correcting search text
Technical Field
The embodiment of the invention relates to an information processing technology, in particular to a method, a device, a terminal and a storage medium for correcting a search text.
Background
With the rapid development of information technology, users can query information in a search mode to acquire required information. Generally, a user can input a search text according to own requirements, and search results corresponding to the search text are found from an information set by means of a retrieval tool. For example, in a webcast platform, a user can enter a anchor name in a search entry so that a live video desired to be viewed can be quickly found.
In general, when a user inputs search text, errors such as misspellings, word inversion, and the like often occur, so that the user cannot find a desired search result, and therefore, a correction process needs to be performed on the search text input by the user.
The existing correction process is: after the search text is segmented, each search word is directly corrected, and the context of each corrected word needs to be considered. However, in the prior art, word segmentation of a search text is often inaccurate, and when relevance of a search word is poor, if context of a corrected word is still considered, ambiguous information may exist, and a truly accurate text cannot be determined, so that correction accuracy is reduced, and search experience of a user is affected.
Disclosure of Invention
The embodiment of the invention provides a method, a device, a terminal and a storage medium for correcting a search text, which aim to solve the problem of low correction accuracy caused by poor relevance of search terms in the prior art, thereby improving the correction accuracy and further improving the search experience of a user.
In a first aspect, an embodiment of the present invention provides a method for correcting a search text, including:
acquiring a target search text, and performing word segmentation processing on the target search text to determine a search word sequence corresponding to the target search text;
determining each candidate accurate word sequence corresponding to the search word sequence according to a search corpus, wherein search words in the search word sequence correspond to candidate accurate words in the candidate accurate word sequence one to one;
segmenting the search word sequence, and determining each search word segment corresponding to each segmentation mode and each candidate accurate word segment corresponding to each search word segment, wherein the segmentation mode comprises the number of the search word segments and the number of the search words corresponding to each search word segment;
determining candidate correction probability corresponding to each candidate accurate word sequence in each segmentation mode according to each search word segment corresponding to each segmentation mode, each candidate accurate word segment corresponding to the search word segment, the search corpus and the total number of search words;
and determining a target accurate word sequence corresponding to the target search text according to the candidate correction probability corresponding to each candidate accurate word sequence in each segmentation mode, and determining a target accurate text corresponding to the target search text according to the target accurate word sequence.
In a second aspect, an embodiment of the present invention further provides a device for correcting a search text, including:
the search word sequence determining module is used for acquiring a target search text and performing word segmentation processing on the target search text to determine a search word sequence corresponding to the target search text;
the candidate accurate word sequence determining module is used for determining each candidate accurate word sequence corresponding to the search word sequence according to a search corpus, wherein the search words in the search word sequence correspond to the candidate accurate words in the candidate accurate word sequence one by one;
the search word sequence segmentation module is used for segmenting the search word sequence and determining each search word segment corresponding to each segmentation mode and each candidate accurate word segment corresponding to each search word segment, wherein the segmentation mode comprises the number of the search word segments and the number of the search words corresponding to each search word segment;
a candidate correction probability determining module, configured to determine, according to each search word segment corresponding to each segmentation mode, each candidate accurate word segment corresponding to the search word segment, the search corpus, and a total number of search words, a candidate correction probability corresponding to each candidate accurate word sequence in each segmentation mode;
and the target accurate text determining module is used for determining a target accurate word sequence corresponding to the target search text according to the candidate correction probability corresponding to each candidate accurate word sequence in each segmentation mode, and determining the target accurate text corresponding to the target search text according to the target accurate word sequence.
In a third aspect, an embodiment of the present invention further provides a terminal, where the terminal includes:
one or more processors;
a memory for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement a method for correcting a search text as described in any embodiment of the invention.
In a fourth aspect, the embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the method for correcting a search text according to any embodiment of the present invention.
The method comprises the steps of segmenting a search word sequence, and determining each search word segment corresponding to each segmentation mode and each candidate accurate word segment corresponding to each search word segment; determining candidate correction probability corresponding to each candidate accurate word sequence in each segmentation mode according to each search word segment corresponding to each segmentation mode, each candidate accurate word segment corresponding to the search word segment, a search corpus and the total number of search words; and determining a target accurate word sequence corresponding to the target search text according to the candidate correction probability corresponding to each candidate accurate word sequence in each segmentation mode, and determining a target accurate text corresponding to the target search text according to the target accurate word sequence. By carrying out segmentation error correction on the search word sequence and considering the relevance of the search word in each segmentation mode, the optimal candidate accurate word sequence corresponding to the optimal segmentation mode can be determined, the condition that the calculated target accurate word sequence is inaccurate due to low relevance of the search word is avoided, the correction accuracy is improved, and the search experience of a user is further improved.
Drawings
Fig. 1 is a flowchart of a method for correcting a search text according to an embodiment of the present invention;
FIG. 2 is a flowchart of a method for correcting a search text according to a second embodiment of the present invention;
fig. 3 is a schematic structural diagram of a correction apparatus for searching a text according to a third embodiment of the present invention;
fig. 4 is a schematic structural diagram of a terminal according to a fourth embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
Example one
Fig. 1 is a flowchart of a method for correcting a search text according to an embodiment of the present invention, which is applicable to spell correction of a search text, and especially applicable to a live webcast platform for correcting a scene of the search text. The method can be executed by a text search correction device, which can be implemented by software and/or hardware, and is integrated in a terminal with a search function, such as a smart phone, a tablet computer, a desktop computer, and the like. The method specifically comprises the following steps:
s110, obtaining a target search text, and performing word segmentation processing on the target search text to determine a search word sequence corresponding to the target search text.
The target search text refers to the search text currently input by the user. For example, the search text in the current search entry may be determined as the target search text. The word segmentation process may refer to dividing the target search text into a plurality of search words according to a word segmentation dictionary or other word segmentation rules. The search word sequence refers to a sequence formed by each search word obtained by performing word segmentation processing on a target search text. The search word arrangement order in the search word sequence is consistent with the search word order in the target search text. For example, if the target search text is "not surprised", the determined search word sequence after performing word matching according to the word segmentation dictionary may be: "not, surprise".
And S120, determining each candidate accurate word sequence corresponding to the search word sequence according to the search corpus, wherein the search words in the search word sequence correspond to the candidate accurate words in the candidate accurate word sequence one by one.
Wherein the search corpus can be predetermined from search behavior logs of a large number of users. The search corpus may include a plurality of historical search keywords and an accurate keyword corrected for each historical search keyword, where the accurate keyword may be determined according to a click operation of a user. The candidate accurate word sequence refers to any possible correction sequence corresponding to the target search text. The embodiment can determine the optimal accurate word sequence from all candidate accurate word sequences. In thatIn this embodiment, each candidate accurate word sequence corresponds to one candidate accurate word text, where an order of the candidate accurate words in the candidate accurate word text is consistent with the candidate accurate word sequence. The search words in the search word sequence correspond to the accurate candidate words in the accurate candidate word sequence one by one, that is, each search word in the search word sequence corresponds to one accurate candidate word in the accurate candidate word sequence. Illustratively, the target search word text corresponds to a search word sequence of q1,q2,...,qNA candidate accurate word sequence is c1,c2,...,cNWherein the search term qiAnd candidate accurate word ciAnd correspond to each other.
Specifically, in this embodiment, at least one candidate accurate word corresponding to each search word in the search word sequence may be determined according to the search corpus, and the candidate accurate words are arranged and combined to determine each candidate accurate word sequence corresponding to the search word sequence. Illustratively, if the sequence of the search word is "no, surprise", the candidate accurate word corresponding to the search word "no" is determined according to the search corpus as follows: "not", "step" and "part", the search term "surprise" corresponds to the exact word candidate: "surprise" and "elaboration", then the exact word sequence of each candidate determined is: "don't care, elaborate", "step, surprise heart", "step, elaborate", "part, surprise heart" and "part, elaborate". It should be noted that the candidate accurate word may also be the search word itself, because a certain search word in the target search text may also be an accurate spelling search word.
S130, segmenting the search word sequence, and determining each search word segment corresponding to each segmentation mode and each candidate accurate word segment corresponding to each search word segment.
Wherein the search term segment includes at least one search term. The search word segment may be a part of the search word sequence or the whole search word sequence, and the specific situation is determined by a segmentation mode. The segmentation manner may include the number of search term segments and the number of search terms corresponding to each search term segment. The value range of the number of the search word segments in the segmentation mode is as follows: the total number of the search words corresponding to the search word sequence is more than or equal to 1 and less than or equal to. The segmentation mode in this embodiment is used to characterize that the search word sequence is divided into several search word segments and each search word segment includes several search words. The number of the segmentation modes can be determined according to the total number of the search words corresponding to the search word sequence. For each candidate accurate word sequence, each candidate accurate word segment in the candidate accurate word sequence corresponds to a search word segment one by one, the number of the candidate accurate word segments is the same as that of the search word segments, and the number of the candidate accurate words of each candidate accurate word segment is also the same as that of the search words of the corresponding search word segment.
Illustratively, assume that the search word sequence is q1,q2,q3A candidate accurate word sequence is c1,c2,c3There are four segmentation modes, the first segmentation mode is: dividing the search word sequence into only one search word segment, i.e. q1,q2,q3When the corresponding candidate accurate word segment is c1,c2,c3(ii) a The second segmentation mode is as follows: dividing the search word sequence into two search word segments, the first search word segment including two search words and the second search word segment including one search word, i.e. q1,q2And q is3At this time, the corresponding candidate accurate word segments are respectively c1,c2And c3(ii) a The third segmentation mode is as follows: dividing the search word sequence into two search word segments, the first search word segment including a search word and the second search word segment including two search words, q1And q is2,q3At this time, the corresponding candidate accurate word segments are respectively c1And c2,c3(ii) a The fourth segmentation mode is as follows: dividing the search word sequence into three search word segments, each search word being a search word segment, namely q1、q2And q is3At this time, the corresponding candidate accurate word segments are respectively c1、c2And c3
Specifically, each segmentation mode corresponds to at least one search term segment, and each search term segment corresponds to one accurate candidate term segment in each accurate candidate term sequence.
S140, determining candidate correction probability corresponding to each candidate accurate word sequence in each segmentation mode according to each search word segment corresponding to each segmentation mode, each candidate accurate word segment corresponding to the search word segment, a search corpus and the total number of search words.
The total number of search terms refers to the number of search terms that constitute the search term sequence. The candidate correction probability corresponding to the candidate accurate word sequence refers to the correction probability of the search word sequence corrected to the candidate accurate word sequence.
Specifically, each segmentation mode of the search word sequence is different, so that the candidate correction probabilities corresponding to the candidate accurate word sequences calculated in each segmentation mode are also different. For a certain candidate accurate word sequence in a certain segmentation mode, determining candidate correction probability corresponding to the candidate accurate word sequence according to the search corpus, each search word segment corresponding to the segmentation mode, each candidate accurate word segment corresponding to each search word segment in the candidate accurate word sequence and the total number of search words. In the embodiment, the candidate correction probability corresponding to each candidate accurate word sequence in each segmentation mode is calculated, so that the candidate correction probability corresponding to all cases in which the context of at least one candidate accurate word is considered or the contexts of all candidate accurate words are not considered can be obtained, the problem of low correction accuracy caused by low relevance of search words or less historical data in a search corpus is solved, and the correction accuracy is improved.
Illustratively, if the segmentation mode is: all the search words are used as a search word segment, that is, the search word sequence only includes one search word segment, that is, the search word sequence corresponds to a non-segmented condition, and the candidate correction probability corresponding to each candidate accurate word sequence determined by the segmentation mode needs to consider the context relationship of each candidate accurate word. If the segmentation mode is as follows: each search word is taken as a search word segment, that is, under the condition that the number of the search word segments is the largest, the context relation of the candidate accurate words cannot be considered by the candidate correction probability corresponding to each candidate accurate word sequence determined by the segmentation mode.
S150, determining a target accurate word sequence corresponding to the target search text according to the candidate correction probability corresponding to each candidate accurate word sequence in each segmentation mode, and determining a target accurate text corresponding to the target search text according to the target accurate word sequence.
The target accurate word sequence may refer to an optimal candidate accurate word sequence corresponding to the optimal segmentation mode. For each segmentation mode, a candidate correction probability corresponding to each candidate accurate word sequence can be obtained. In other words, for each candidate accurate word sequence, the candidate correction probability of the candidate accurate word sequence in each segmentation mode can also be obtained. In the present embodiment, the following two ways for determining the target accurate word sequence corresponding to the target search text may be included, but not limited to.
Optionally, determining a target accurate word sequence corresponding to the target search text according to the candidate correction probability corresponding to each candidate accurate word sequence in each segmentation mode, including: determining an accurate word sequence to be selected corresponding to each segmentation mode and a correction probability to be selected corresponding to each accurate word sequence to be selected according to the correction probability to be selected corresponding to each accurate word sequence to be selected in each segmentation mode; and determining the accurate word sequence to be selected corresponding to the maximum correction probability to be selected as the accurate target word sequence corresponding to the target search text. Specifically, for a certain segmentation mode, candidate correction probabilities corresponding to each candidate accurate word sequence in the segmentation mode are compared, the candidate accurate word sequence corresponding to the maximum candidate correction probability determines an accurate word sequence to be selected corresponding to the segmentation mode, the maximum candidate correction probability is determined as an accurate word sequence to be selected corresponding to the accurate word sequence to be selected, so that the accurate word sequence to be selected corresponding to each segmentation mode can be obtained, then the accurate word sequence to be selected corresponding to each accurate word sequence to be selected is compared, the accurate word sequence to be selected corresponding to the maximum accurate word sequence to be selected is determined as a target accurate word sequence corresponding to a target search text, and therefore more accurate word sequences can be determined.
Optionally, determining a target accurate word sequence corresponding to the target search text according to the candidate correction probability corresponding to each candidate accurate word sequence in each segmentation mode, including: determining a target correction probability corresponding to each candidate accurate word sequence according to the candidate correction probability corresponding to each candidate accurate word sequence in each segmentation mode; and determining the candidate accurate word sequence corresponding to the maximum target correction probability as a target accurate word sequence corresponding to the target search text. Specifically, for a certain candidate accurate word sequence, the candidate correction probabilities determined by the candidate accurate word sequence in each segmentation mode are compared, the maximum candidate correction probability is determined as the target correction probability corresponding to the candidate accurate word sequence, so that the target correction probability corresponding to each candidate accurate word sequence can be obtained, then the target correction probabilities corresponding to each candidate accurate word sequence are compared, the candidate accurate word sequence corresponding to the maximum target correction probability is determined as the target accurate word sequence corresponding to the target search text, and therefore a more accurate word sequence can be determined.
After the target accurate word sequence is determined, the accurate words in the target accurate word sequence can be directly spliced according to the sequence of the target accurate word sequence, and the sequence of the accurate words in the target accurate word sequence is ensured to be consistent with the sequence of the accurate words in the determined target accurate text, so that the target accurate text corresponding to the target search text, namely the optimal accurate text, can be obtained. According to the method and the device, the target search text can be corrected to be the target accurate text, so that the accurate search result required by the user can be obtained by automatically searching according to the target accurate text, the search accuracy is improved, the user does not need to manually input the accurate search text again, and the search experience of the user is improved.
The method comprises the steps of segmenting a search word sequence, and determining each search word segment corresponding to each segmentation mode and each candidate accurate word segment corresponding to each search word segment; determining candidate correction probability corresponding to each candidate accurate word sequence in each segmentation mode according to each search word segment corresponding to each segmentation mode, each candidate accurate word segment corresponding to the search word segment, a search corpus and the total number of search words; and determining a target accurate word sequence corresponding to the target search text according to the candidate correction probability corresponding to each candidate accurate word sequence in each segmentation mode, and determining a target accurate text corresponding to the target search text according to the target accurate word sequence. By carrying out segmentation error correction on the search word sequence and considering the relevance of the search word in each segmentation mode, the optimal candidate accurate word sequence corresponding to the optimal segmentation mode can be determined, the condition that the calculated target accurate word sequence is inaccurate due to low relevance of the search word is avoided, the correction accuracy is improved, and the search experience of a user is further improved.
Example two
Fig. 2 is a flowchart of a correction method for a search text according to a second embodiment of the present invention, where this embodiment further optimizes "determining candidate correction probabilities corresponding to each candidate accurate word sequence in each segmentation mode according to each search word segment corresponding to each segmentation mode, each candidate accurate word segment corresponding to the search word segment, a search corpus, and a total number of search words" on the basis of the first embodiment of the present invention. Wherein explanations of the same or corresponding terms as those in the above embodiment are omitted.
Referring to fig. 2, the method for correcting a search text provided in this embodiment specifically includes the following steps:
s210, obtaining a target search text, and performing word segmentation processing on the target search text to determine a search word sequence corresponding to the target search text.
S220, determining each candidate accurate word sequence corresponding to the search word sequence according to the search corpus, wherein the search words in the search word sequence correspond to the candidate accurate words in the candidate accurate word sequence one by one.
S230, segmenting the search word sequence, and determining each search word segment corresponding to each segmentation mode and each candidate accurate word segment corresponding to each search word segment.
S240, determining each segmentation mode as a first segmentation mode one by one, and determining each candidate accurate word sequence under the first segmentation mode as a first accurate word sequence one by one.
Specifically, in this embodiment, each segmentation mode is determined as a first segmentation mode one by one, each accurate candidate word sequence in the first segmentation mode is determined as a first accurate candidate word sequence one by one, each accurate candidate word segment in the accurate candidate word sequences is determined as each first accurate candidate word segment in the first accurate candidate word sequence, and each accurate candidate word in the accurate candidate word segments is determined as each first accurate candidate word in the first accurate candidate word segment, so that the candidate correction probability corresponding to each accurate candidate word sequence in each segmentation mode is determined one by one according to the same determination process.
And S250, determining the correction probability of each segment corresponding to the first segmentation mode according to the search corpus.
Wherein the first segmentation mode corresponds to at least one search term segment. The segment correction probability is the probability that a search word segment is corrected to the corresponding first exact word segment. According to the embodiment, the probability that each search word segment is corrected to be the corresponding first accurate word segment, that is, the segment correction probability corresponding to each search word segment, can be determined according to the search corpus and the number of search words in the search word segment.
Optionally, S250 includes: determining the search word segments corresponding to the first segmentation mode one by one as target search word segments; if the target search word segment only comprises one search word, determining a first correction probability corresponding to the target search word segment according to the search corpus, and determining the first correction probability as a segment correction probability corresponding to the target search word segment; and if the target search word segment comprises at least two search words, determining second correction probabilities and third correction probabilities corresponding to the target search word segment according to the search corpus, and determining segment correction probabilities corresponding to the target search word segment according to the second correction probabilities and the third correction probabilities.
And the second correction probability is the probability that the current search word in the target search word segment is corrected to be the corresponding current first accurate word. The third correction probability is the probability that the next first exact word occurs after the current first exact word. And when the target search word segment comprises at least two search words, determining each search word in the target search word segment as the current search word one by one. The current first exact word refers to the first exact word corresponding to the current search word in the first exact word segment. The latter first exact word refers to the first exact word that is located next and adjacent to the current first exact word in the order of the sequence of the first exact words.
Specifically, each search term segment corresponding to the first segmentation mode is determined as a target search term segment one by one, so that the segment correction probability corresponding to each search term segment can be determined one by one through the same process. And if the target search word segment only comprises one search word, namely the corresponding first accurate word segment also only comprises one first accurate word, directly determining the first correction probability as the segment correction probability corresponding to the target search word segment. And if the target search word segment comprises at least two search words, namely the corresponding first accurate word segment also comprises at least two first accurate words, determining the segment correction probability corresponding to the target search word segment according to each second correction probability and each third correction probability. It should be noted that when the target search term segment includes only one search term, the context of the first exact term corresponding to the search term does not need to be considered, and when the target search term segment includes at least two search terms, the context of each first exact term in the first exact term segment, that is, the third correction probability, needs to be considered.
Illustratively, if the target search term segment includes only one search term q1The corresponding first exact word segment also comprises only one first exact word c1Then determine the search word q from the search corpus1Corrected to the corresponding first exact word c1Has a first correction probability of p (c)1|q1) At this time, p (c) may be substituted1|q1) And directly determining the segment correction probability corresponding to the target search word segment. If the target search word segment comprises two search words q1And q is2The corresponding first exact word segment also includes two first exact words c1And c2Then, determining the probability that each search word is corrected to be the corresponding first accurate word according to the search corpus, that is, two second correction probabilities are: p (c)1|q1) And p (c)2|q2) And the probability of the occurrence of the next first exact word after the current first exact word, i.e. a third correction probability is: p (c)2|c1) At this time according to p (c)1|q1)、p(c2|q2) And p (c)2|c1) And determining the segment correction probability corresponding to the target search word segment.
Optionally, determining, according to the search corpus, second correction probabilities and third correction probabilities corresponding to the target search word segment, includes: determining historical search times corresponding to a current search word in a target search word segment, historical correction times of the current search word corrected to a corresponding first accurate word, first occurrence times of the current first accurate word corresponding to the current search word and second occurrence times of a next first accurate word of the current first accurate word according to a search corpus; determining each second correction probability according to the historical search times and the historical correction times; and determining each third correction probability according to the first occurrence number and the second occurrence number.
The historical search times corresponding to the current search word refer to the times of occurrence of the current search word in the historical search data of the search corpus. The historical correction times of the current search word corrected to the corresponding first accurate word refer to the times that the current search word is not accurately corrected to the first accurate word in the historical search data of the search corpus. The first occurrence frequency of the current first accurate word corresponding to the current search word refers to the occurrence frequency of the current first accurate word in the search corpus. The second occurrence frequency of the first accurate word next to the current first accurate word refers to the occurrence frequency of the first accurate word next to the current first accurate word in the search corpus.
Specifically, when the target search term segment includes at least two search terms, the embodiment may determine a ratio of the historical correction times to the historical search times as a second correction probability corresponding to the current search term, and determine a ratio of the second occurrence times to the first occurrence times as a third correction probability.
In this embodiment, when the target search term segment includes only one search term, the historical correction times of the search term in the search corpus may be divided by the historical search times of the search term, and the obtained operation result is determined as the first correction probability, that is, the probability that the search term is corrected to the corresponding first accurate term.
Optionally, in this embodiment, it may be assumed that the segment correction probability of the search term segment corrected to the corresponding first accurate term segment satisfies the hidden markov process, so that the segment correction probability may be determined more conveniently. Exemplary, segment correction probabilities
Figure GDA0002988041220000141
The calculation formula of (c) can be simplified as follows:
Figure GDA0002988041220000142
namely:
Figure GDA0002988041220000143
wherein the content of the first and second substances,
Figure GDA0002988041220000144
is a target search term segment;
Figure GDA0002988041220000145
the first accurate word segment is corresponding to the target search word segment; n isi+1 is a subscript corresponding to a first search word in the target search word segment or a first exact word in the first exact word segment; n isi+1The subscript corresponding to the last search word in the target search word segment or the last first accurate word in the first accurate word segment;
Figure GDA0002988041220000146
is the segment correction probability corresponding to the target search term segment; p (c)j|qj) Is the jth second correction probability, i.e. the probability that the jth search word in the target search word segment is corrected to the corresponding first exact word, p (c)j+1|cj) Is the j +1 third correction probability, that is, the probability that the j +1 th first exact word appears after the j th first exact word in the first exact word segment.
Specifically, if the target search term segment includes at least two search terms, each second correction probability and each third correction probability may be multiplied, and the multiplication result is determined as the segment correction probability corresponding to the target search term segment.
S260, determining candidate correction probability corresponding to the first accurate word sequence in the first segmentation mode according to the correction probability of each segment, the number of the search word segments and the total number of the search words.
Optionally, the candidate correction probability corresponding to the first exact word sequence is determined according to the following formula:
Figure GDA0002988041220000147
s.t.0=n1<n2<...<nk=N
wherein, p (c)1,c2,...,cN|q1,q2,...,qN) Is the candidate correction probability corresponding to the first exact word sequence; c. C1,c2,...,cNIs a first sequence of exact words; q. q.s1,q2,...,qNIs a search word sequence corresponding to the target search text; n is the total number of search terms; k is the number of the search term segments corresponding to the first segmentation mode; n isiIs the subscript of the last search term in the i-1 th search term segment;
Figure GDA0002988041220000151
is the ith search term segment;
Figure GDA0002988041220000152
is the first accurate word segment corresponding to the ith search word segment;
Figure GDA0002988041220000153
is the segment correction probability corresponding to the ith search term segment.
Illustratively, assuming that the total number of search words is 3, i.e., N equals 3, the first exact word sequence is c1,c2,c3The search word sequence corresponding to the target search text is q1,q2,q3The first segmentation mode is as follows: dividing the search word sequence into two search word segments, wherein the first search word segment comprises two search words, the second search word segment comprises one search word, and the first segmentation mode corresponds to the two search word segments respectively in q1,q2And q is3The corresponding first accurate word segments are respectively c1,c2And c3The number k of the search word segments corresponding to the first segmentation mode is 2, and the subscript n of the last search word in the first search word segment22, subscript n of the last search term in the second search term segment3When 3, the first exact word sequence c1,c2,c3The corresponding candidate correction probabilities are:
Figure GDA0002988041220000154
in this embodiment, by repeating steps S240 to S260, the candidate correction probability corresponding to each candidate accurate word sequence in each segmentation mode can be determined.
S270, determining a target accurate word sequence corresponding to the target search text according to the candidate correction probability corresponding to each candidate accurate word sequence in each segmentation mode, and determining a target accurate text corresponding to the target search text according to the target accurate word sequence.
According to the technical scheme of the embodiment, the candidate correction probability corresponding to each candidate accurate word sequence in each segmentation mode can be accurately determined according to the segment correction probability corresponding to each search word segment in each segmentation mode, so that the corrected accurate word sequence can be more accurately determined, and the correction accuracy is improved.
The following is an embodiment of a device for correcting a search text according to an embodiment of the present invention, which belongs to the same inventive concept as a method for correcting a search text according to the above embodiments, and reference may be made to the above embodiment of the method for correcting a search text for details that are not described in detail in the embodiment of the device for correcting a search text.
EXAMPLE III
Fig. 3 is a schematic structural diagram of a correction apparatus for a search text according to a third embodiment of the present invention, where the present embodiment is applicable to a case of performing spelling correction on a search text, and the apparatus specifically includes: a search word sequence determination module 310, a candidate exact word sequence determination module 320, a search word sequence segmentation module 330, a candidate correction probability determination module 340, and a target exact text determination module 350.
The search word sequence determining module 310 is configured to obtain a target search text, perform word segmentation processing on the target search text, and determine a search word sequence corresponding to the target search text; a candidate accurate word sequence determining module 320, configured to determine, according to the search corpus, each candidate accurate word sequence corresponding to the search word sequence, where the search words in the search word sequence correspond to the candidate accurate words in the candidate accurate word sequence one to one; a search word sequence segmenting module 330, configured to segment the search word sequence, and determine each search word segment corresponding to each segmentation mode and each candidate accurate word segment corresponding to each search word segment, where the segmentation mode includes the number of search word segments and the number of search words corresponding to each search word segment; a candidate correction probability determining module 340, configured to determine candidate correction probabilities corresponding to the candidate accurate word sequences in each segmentation mode according to each search word segment corresponding to each segmentation mode, each candidate accurate word segment corresponding to the search word segment, the search corpus, and the total number of search words; and the target accurate text determining module 350 is configured to determine a target accurate word sequence corresponding to the target search text according to the candidate correction probability corresponding to each candidate accurate word sequence in each segmentation mode, and determine a target accurate text corresponding to the target search text according to the target accurate word sequence.
According to the embodiment of the invention, the search word sequence is subjected to segmented error correction, and the relevance of the search word in each segmentation mode is considered, so that the optimal candidate accurate word sequence corresponding to the optimal segmentation mode can be determined, the condition that the calculated target accurate word sequence is inaccurate due to low relevance of the search word is avoided, the correction accuracy is improved, and the search experience of a user is further improved.
Optionally, the candidate correction probability determining module 340 includes:
the first accurate word sequence determining unit is used for determining each segmentation mode as a first segmentation mode one by one and determining each candidate accurate word sequence under the first segmentation mode as a first accurate word sequence one by one;
the segment correction probability determining unit is used for determining each segment correction probability corresponding to the first segmentation mode according to the search corpus, wherein the segment correction probability is the probability that the search word segment is corrected into the corresponding first accurate word segment;
and the candidate correction probability determining unit is used for determining the candidate correction probability corresponding to the first accurate word sequence in the first segmentation mode according to the correction probability of each segment, the number of the search word segments and the total number of the search words.
Optionally, the segment correction probability determining unit includes:
the target search word segment determining subunit is used for determining the search word segments corresponding to the first segmentation mode one by one as target search word segments;
a first segment correction probability determining subunit, configured to determine, according to the search corpus, a first correction probability corresponding to the search word in the target search word segment if the target search word segment only includes one search word, and determine the first correction probability as a segment correction probability corresponding to the target search word segment, where the first correction probability is a probability that the search word is corrected to a corresponding first accurate word;
and the second segment correction probability determining subunit is configured to determine, according to the search corpus, second correction probabilities and third correction probabilities corresponding to the target search word segments if the target search word segments include at least two search words, and determine, according to the second correction probabilities and the third correction probabilities, segment correction probabilities corresponding to the target search word segments, where the second correction probability is a probability that a current search word in the target search word segments is corrected to a corresponding current first exact word, and the third correction probability is a probability that a next first exact word appears after the current first exact word.
Optionally, the second segment correction probability determining subunit is further configured to: determining historical search times corresponding to a current search word in a target search word segment, historical correction times of the current search word corrected to a corresponding first accurate word, first occurrence times of the current first accurate word corresponding to the current search word and second occurrence times of a next first accurate word of the current first accurate word according to a search corpus; determining each second correction probability according to the historical search times and the historical correction times; and determining each third correction probability according to the first occurrence number and the second occurrence number.
Optionally, the segment correction probability corresponding to the target search term segment is determined according to the following formula:
Figure GDA0002988041220000181
wherein the content of the first and second substances,
Figure GDA0002988041220000182
is a target search term segment;
Figure GDA0002988041220000183
the first accurate word segment is corresponding to the target search word segment; n isi+1 is a subscript corresponding to a first search word in the target search word segment or a first exact word in the first exact word segment; n isi+1The subscript corresponding to the last search word in the target search word segment or the last first accurate word in the first accurate word segment;
Figure GDA0002988041220000184
is the segment correction probability corresponding to the target search term segment; p (c)j|qj) Is the jth second correction probability, i.e. the jth search word in the target search word segment is corrected to be a pairProbability of the corresponding first exact word, p (c)j+1|cj) Is the (j + 1) th third correction probability, i.e. the probability that the (j + 1) th first exact word appears after the (j) th first exact word in the first exact word segment.
Optionally, the candidate correction probability corresponding to the first exact word sequence is determined according to the following formula:
Figure GDA0002988041220000191
s.t.0=n1<n2<...<nk=N
wherein, p (c)1,c2,...,cN|q1,q2,...,qN) Is the candidate correction probability corresponding to the first exact word sequence; c. C1,c2,...,cNIs a first sequence of exact words; q. q.s1,q2,...,qNIs a search word sequence corresponding to the target search text; n is the total number of search terms; k is the number of the search term segments corresponding to the first segmentation mode; n isiIs the subscript of the last search term in the i-1 th search term segment;
Figure GDA0002988041220000192
is the ith search term segment;
Figure GDA0002988041220000193
is the first accurate word segment corresponding to the ith search word segment;
Figure GDA0002988041220000194
is the segment correction probability corresponding to the ith search term segment.
Optionally, the target accurate text determining module 350 is further configured to: determining an accurate word sequence to be selected corresponding to each segmentation mode and a correction probability to be selected corresponding to each accurate word sequence to be selected according to the correction probability to be selected corresponding to each accurate word sequence to be selected in each segmentation mode; and determining the accurate word sequence to be selected corresponding to the maximum correction probability to be selected as the accurate target word sequence corresponding to the target search text.
The correction device for the search text provided by the embodiment of the invention can execute the correction method for the search text provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of executing the correction method for the search text.
It should be noted that, in the embodiment of the correction apparatus for searching for a text, the units and modules included in the embodiment are only divided according to functional logic, but are not limited to the above division as long as the corresponding functions can be implemented; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention.
Example four
Fig. 4 is a schematic structural diagram of a terminal according to a fourth embodiment of the present invention. Referring to fig. 4, the terminal includes:
one or more processors 410;
a memory 420 for storing one or more programs;
when executed by one or more processors 410, cause the one or more processors 410 to implement a method for correcting search text as set forth in any one of the above embodiments, the method comprising:
acquiring a target search text, and performing word segmentation processing on the target search text to determine a search word sequence corresponding to the target search text;
determining each candidate accurate word sequence corresponding to the search word sequence according to a search corpus, wherein search words in the search word sequence correspond to candidate accurate words in the candidate accurate word sequence one to one;
segmenting the search word sequence, and determining each search word segment corresponding to each segmentation mode and each candidate accurate word segment corresponding to each search word segment, wherein the segmentation mode comprises the number of the search word segments and the number of the search words corresponding to each search word segment;
determining candidate correction probability corresponding to each candidate accurate word sequence in each segmentation mode according to each search word segment corresponding to each segmentation mode, each candidate accurate word segment corresponding to the search word segment, the search corpus and the total number of search words;
and determining a target accurate word sequence corresponding to the target search text according to the candidate correction probability corresponding to each candidate accurate word sequence in each segmentation mode, and determining a target accurate text corresponding to the target search text according to the target accurate word sequence.
In FIG. 4, a processor 410 is illustrated as an example; the processor 410 and the memory 420 in the terminal may be connected by a bus or other means, as exemplified by the bus connection in fig. 4.
The memory 420 serves as a computer-readable storage medium, and may be used to store software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to the correction method of the search text in the embodiment of the present invention (for example, the search word sequence determination module 310, the candidate accurate word sequence determination module 320, the search word sequence segmentation module 330, the candidate correction probability determination module 340, and the target accurate text determination module 350 in the correction apparatus of the search text). The processor 410 executes various functional applications of the terminal and data processing, i.e., implements the above-described correction method of the search text, by executing software programs, instructions, and modules stored in the memory 420.
The memory 420 mainly includes a program storage area and a data storage area, wherein the program storage area can store an operating system and an application program required by at least one function; the storage data area may store data created according to the use of the terminal, and the like. Further, the memory 420 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, the memory 420 may further include memory located remotely from the processor 410, which may be connected to the terminal over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The terminal proposed by the present embodiment belongs to the same inventive concept as the correction method for the search text proposed by the above embodiment, and the technical details not described in detail in the present embodiment can be referred to the above embodiment, and the present embodiment has the same beneficial effects as the correction method for executing the search text.
EXAMPLE five
This fifth embodiment provides a computer-readable storage medium, on which a computer program is stored, the program, when executed by a processor, implementing a method for correcting a search text according to any embodiment of the present invention, the method including:
acquiring a target search text, and performing word segmentation processing on the target search text to determine a search word sequence corresponding to the target search text;
determining each candidate accurate word sequence corresponding to the search word sequence according to a search corpus, wherein search words in the search word sequence correspond to candidate accurate words in the candidate accurate word sequence one to one;
segmenting the search word sequence, and determining each search word segment corresponding to each segmentation mode and each candidate accurate word segment corresponding to each search word segment, wherein the segmentation mode comprises the number of the search word segments and the number of the search words corresponding to each search word segment;
determining candidate correction probability corresponding to each candidate accurate word sequence in each segmentation mode according to each search word segment corresponding to each segmentation mode, each candidate accurate word segment corresponding to the search word segment, the search corpus and the total number of search words;
and determining a target accurate word sequence corresponding to the target search text according to the candidate correction probability corresponding to each candidate accurate word sequence in each segmentation mode, and determining a target accurate text corresponding to the target search text according to the target accurate word sequence.
Computer storage media for embodiments of the invention may employ any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. The computer-readable storage medium may be, for example but not limited to: an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
It will be understood by those skilled in the art that the modules or steps of the invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of computing devices, and optionally they may be implemented by program code executable by a computing device, such that it may be stored in a memory device and executed by a computing device, or it may be separately fabricated into various integrated circuit modules, or it may be fabricated by fabricating a plurality of modules or steps thereof into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims (10)

1. A correction method for a search text, comprising:
acquiring a target search text, and performing word segmentation processing on the target search text to determine a search word sequence corresponding to the target search text;
determining each candidate accurate word sequence corresponding to the search word sequence according to a search corpus, wherein search words in the search word sequence correspond to candidate accurate words in the candidate accurate word sequence one to one;
segmenting the search word sequence, and determining each search word segment corresponding to each segmentation mode and each candidate accurate word segment corresponding to each search word segment, wherein the segmentation mode comprises the number of the search word segments and the number of the search words corresponding to each search word segment;
determining candidate correction probability corresponding to each candidate accurate word sequence in each segmentation mode according to each search word segment corresponding to each segmentation mode, each candidate accurate word segment corresponding to the search word segment, the search corpus and the total number of search words, wherein the total number of the search words is the number of the search words forming the search word sequence;
and determining a target accurate word sequence corresponding to the target search text according to the candidate correction probability corresponding to each candidate accurate word sequence in each segmentation mode, and determining a target accurate text corresponding to the target search text according to the target accurate word sequence.
2. The method of claim 1, wherein determining the candidate correction probability corresponding to each of the candidate exact word sequences in each of the segmentation modes according to each of the search word segments corresponding to each of the segmentation modes, each of the candidate exact word segments corresponding to the search word segment, the search corpus, and the total number of search words comprises:
determining each segmentation mode as a first segmentation mode one by one, and determining each candidate accurate word sequence in the first segmentation mode as a first accurate word sequence one by one;
determining each segment correction probability corresponding to the first segmentation mode according to the search corpus, wherein the segment correction probability is the probability of correcting a search word segment into a corresponding first accurate word segment;
and determining candidate correction probabilities corresponding to the first accurate word sequence in the first segmentation mode according to the segment correction probabilities, the number of the search word segments and the total number of the search words.
3. The method of claim 2, wherein determining the correction probability for each segment corresponding to the first segmentation method based on the search corpus comprises:
determining each search word segment corresponding to the first segmentation mode as a target search word segment one by one;
if the target search word segment only comprises one search word, determining a first correction probability corresponding to the target search word segment according to the search corpus, and determining the first correction probability as a segment correction probability corresponding to the target search word segment, wherein the first correction probability is the probability that the search word is corrected to be a corresponding first accurate word;
if the target search word segment comprises at least two search words, determining second correction probabilities and third correction probabilities corresponding to the target search word segment according to the search corpus, and determining segment correction probabilities corresponding to the target search word segment according to the second correction probabilities and the third correction probabilities, wherein the second correction probabilities are probabilities that a current search word in the target search word segment is corrected to be a corresponding current first exact word, and the third correction probabilities are probabilities that a next first exact word appears after the current first exact word.
4. The method of claim 3, wherein determining second correction probabilities and third correction probabilities corresponding to the target search term segment from the search corpus comprises:
determining historical search times corresponding to a current search word in the target search word segment, historical correction times for correcting the current search word into a corresponding first accurate word, first occurrence times of the current first accurate word corresponding to the current search word and second occurrence times of a next first accurate word of the current first accurate word according to the search corpus;
determining each second correction probability according to the historical search times and the historical correction times;
and determining each third correction probability according to the first occurrence frequency and the second occurrence frequency.
5. The method of claim 3, wherein the segment correction probability corresponding to the target search term segment is determined according to the following formula:
Figure FDA0002988041210000031
wherein the content of the first and second substances,
Figure FDA0002988041210000032
is a target search term segment;
Figure FDA0002988041210000033
the first accurate word segment is corresponding to the target search word segment; n isi+1 is a subscript corresponding to a first search word in the target search word segment or a first exact word in the first exact word segment; n isi+1The subscript corresponding to the last search word in the target search word segment or the last first accurate word in the first accurate word segment;
Figure FDA0002988041210000034
is the segment correction probability corresponding to the target search term segment; p (c)j|qj) Is the jth second correction probability, i.e. the probability that the jth search word in the target search word segment is corrected to the corresponding first exact word, p (c)j+1|cj) Is the (j + 1) th third correction probability, i.e. the probability that the (j + 1) th first exact word appears after the (j) th first exact word in the first exact word segment.
6. The method of claim 2, wherein the candidate correction probability corresponding to the first sequence of words of interest is determined according to the following formula:
Figure FDA0002988041210000035
wherein, p (c)1,c2,...,cN|q1,q2,...,qN) Is the candidate correction probability corresponding to the first exact word sequence; c. C1,c2,...,cNIs the first sequence of exact words; q. q.s1,q2,...,qNThe search word sequence is corresponding to the target search text; n is the total number of search terms; k is the number of the search term segments corresponding to the first segmentation mode; n isiIs the subscript of the last search term in the i-1 th search term segment;
Figure FDA0002988041210000036
is the ith search term segment;
Figure FDA0002988041210000041
is the first accurate word segment corresponding to the ith search word segment;
Figure FDA0002988041210000042
is the segment correction probability corresponding to the ith search term segment.
7. The method of claim 1, wherein determining the target exact word sequence corresponding to the target search text according to the candidate correction probability corresponding to each of the candidate exact word sequences in each of the segmentation modes comprises:
determining an accurate word sequence to be selected corresponding to each segmentation mode and a correction probability to be selected corresponding to each accurate word sequence to be selected according to the candidate correction probability corresponding to each accurate word sequence to be selected in each segmentation mode;
and determining the accurate word sequence to be selected corresponding to the maximum correction probability to be selected as the accurate target word sequence corresponding to the target search text.
8. A correction apparatus for a search text, comprising:
the search word sequence determining module is used for acquiring a target search text and performing word segmentation processing on the target search text to determine a search word sequence corresponding to the target search text;
the candidate accurate word sequence determining module is used for determining each candidate accurate word sequence corresponding to the search word sequence according to a search corpus, wherein the search words in the search word sequence correspond to the candidate accurate words in the candidate accurate word sequence one by one;
the search word sequence segmentation module is used for segmenting the search word sequence and determining each search word segment corresponding to each segmentation mode and each candidate accurate word segment corresponding to each search word segment, wherein the segmentation mode comprises the number of the search word segments and the number of the search words corresponding to each search word segment;
a candidate correction probability determining module, configured to determine candidate correction probabilities corresponding to the candidate accurate word sequences in each segmentation mode according to the search word segments corresponding to each segmentation mode, the candidate accurate word segments corresponding to the search word segments, the search corpus, and a total number of search words, where the total number of search words is the number of search words constituting a search word sequence;
and the target accurate text determining module is used for determining a target accurate word sequence corresponding to the target search text according to the candidate correction probability corresponding to each candidate accurate word sequence in each segmentation mode, and determining the target accurate text corresponding to the target search text according to the target accurate word sequence.
9. A terminal, characterized in that the terminal comprises:
one or more processors;
a memory for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement a method for correcting a search text as recited in any one of claims 1-7.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out a method for correcting a search text according to any one of claims 1 to 7.
CN201810941106.XA 2018-08-17 2018-08-17 Method, device, terminal and storage medium for correcting search text Active CN110019684B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810941106.XA CN110019684B (en) 2018-08-17 2018-08-17 Method, device, terminal and storage medium for correcting search text

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810941106.XA CN110019684B (en) 2018-08-17 2018-08-17 Method, device, terminal and storage medium for correcting search text

Publications (2)

Publication Number Publication Date
CN110019684A CN110019684A (en) 2019-07-16
CN110019684B true CN110019684B (en) 2021-06-15

Family

ID=67188380

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810941106.XA Active CN110019684B (en) 2018-08-17 2018-08-17 Method, device, terminal and storage medium for correcting search text

Country Status (1)

Country Link
CN (1) CN110019684B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105975625A (en) * 2016-05-26 2016-09-28 同方知网数字出版技术股份有限公司 Chinglish inquiring correcting method and system oriented to English search engine
CN106339404A (en) * 2016-06-30 2017-01-18 北京奇艺世纪科技有限公司 Search word recognition method and device
CN106598939A (en) * 2016-10-21 2017-04-26 北京三快在线科技有限公司 Method and device for text error correction, server and storage medium
CN106777073A (en) * 2016-12-13 2017-05-31 深圳爱拼信息科技有限公司 The automatic method for correcting of wrong word and server in a kind of search engine
CN107341181A (en) * 2017-05-27 2017-11-10 武汉斗鱼网络科技有限公司 Method, apparatus, computer-readable recording medium and computer equipment are recommended in search
CN107609098A (en) * 2017-09-11 2018-01-19 北京金堤科技有限公司 Searching method and device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9679024B2 (en) * 2014-12-01 2017-06-13 Facebook, Inc. Social-based spelling correction for online social networks

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105975625A (en) * 2016-05-26 2016-09-28 同方知网数字出版技术股份有限公司 Chinglish inquiring correcting method and system oriented to English search engine
CN106339404A (en) * 2016-06-30 2017-01-18 北京奇艺世纪科技有限公司 Search word recognition method and device
CN106598939A (en) * 2016-10-21 2017-04-26 北京三快在线科技有限公司 Method and device for text error correction, server and storage medium
CN106777073A (en) * 2016-12-13 2017-05-31 深圳爱拼信息科技有限公司 The automatic method for correcting of wrong word and server in a kind of search engine
CN107341181A (en) * 2017-05-27 2017-11-10 武汉斗鱼网络科技有限公司 Method, apparatus, computer-readable recording medium and computer equipment are recommended in search
CN107609098A (en) * 2017-09-11 2018-01-19 北京金堤科技有限公司 Searching method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
搜索引擎的一种在线中文查询纠错方法;胡熠,等;《中文信息学报》;20160131;第30卷(第1期);第71-77页 *

Also Published As

Publication number Publication date
CN110019684A (en) 2019-07-16

Similar Documents

Publication Publication Date Title
WO2020182122A1 (en) Text matching model generation method and device
CN106815311B (en) Question matching method and device
CN110909550B (en) Text processing method, text processing device, electronic equipment and readable storage medium
CN111898643B (en) Semantic matching method and device
CN110598201B (en) Identifying accurate transcriptions from probabilistic inputs
CN109858045B (en) Machine translation method and device
CN108595679B (en) Label determining method, device, terminal and storage medium
CN111194442A (en) Ranking documents based on semantic richness of the documents
CN113158687B (en) Semantic disambiguation method and device, storage medium and electronic device
CN111046060A (en) Data retrieval method, device, equipment and medium based on elastic search
CN110738056B (en) Method and device for generating information
CN111435406A (en) Method and device for correcting database statement spelling errors
CN111930891B (en) Knowledge graph-based search text expansion method and related device
CN111209746B (en) Natural language processing method and device, storage medium and electronic equipment
CN111611471B (en) Searching method and device and electronic equipment
CN110019684B (en) Method, device, terminal and storage medium for correcting search text
CN115858776B (en) Variant text classification recognition method, system, storage medium and electronic equipment
CN116383412A (en) Functional point amplification method and system based on knowledge graph
CN111737571B (en) Searching method and device and electronic equipment
CN109145300B (en) Method and device for correcting search text and terminal
CN111339790B (en) Text translation method, device, equipment and computer readable storage medium
CN111339776B (en) Resume parsing method and device, electronic equipment and computer-readable storage medium
CN109657129B (en) Method and device for acquiring information
JP2019087157A (en) Word vector conversion apparatus, method and program
KR102308521B1 (en) Method and device for updating information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20240410

Address after: Room 315, No.1 Yichuang Street, Huangpu District, Guangzhou City, Guangdong Province, 510000

Patentee after: Working at Bide Digital Technology (Guangzhou) Co.,Ltd.

Country or region after: China

Address before: 11 / F, building B1, phase 4.1, software industry, No.1, Software Park East Road, Wuhan East Lake Development Zone, Wuhan City, Hubei Province, 430070

Patentee before: WUHAN DOUYU NETWORK TECHNOLOGY Co.,Ltd.

Country or region before: China