CN104156454A - Search term correcting method and device - Google Patents

Search term correcting method and device Download PDF

Info

Publication number
CN104156454A
CN104156454A CN201410406835.7A CN201410406835A CN104156454A CN 104156454 A CN104156454 A CN 104156454A CN 201410406835 A CN201410406835 A CN 201410406835A CN 104156454 A CN104156454 A CN 104156454A
Authority
CN
China
Prior art keywords
search word
candidate
error correction
string
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410406835.7A
Other languages
Chinese (zh)
Other versions
CN104156454B (en
Inventor
杨月奎
张海龙
肖立鹏
黄玉兰
刘冰
王刚
王迪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201410406835.7A priority Critical patent/CN104156454B/en
Publication of CN104156454A publication Critical patent/CN104156454A/en
Application granted granted Critical
Publication of CN104156454B publication Critical patent/CN104156454B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/232Orthographic correction, e.g. spell checking or vowelisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a search term correcting method and device. The method comprises the steps of obtaining a search term, splitting the search term to obtain multiple first search term fragments, correcting the first search term fragments to obtain multiple second search term fragments, splicing the multiple second search term fragments to obtain a candidate result, judging whether the candidate result is relational data in a relational database, and determining that the candidate result is a target search term obtained after correcting the search term under the condition that it is judged that the candidate result is relational data in the relational database. By the adoption of the search term correcting method and device, the problem that according to the prior art, the coverage rate of a search term query correction method is low due to the fact that the method needs to rely on a large number of user records is solved, correction of the search term can be achieved without relying on the user records, and then the coverage rate and independence of search term correction are improved.

Description

The error correction method of search word and device
Technical field
The present invention relates to data processing field, in particular to a kind of error correction method and device of search word.
Background technology
When user utilizes search word to carry out relevant search, conventionally need to inquire about error correction to the search word of user's input, in prior art, conventionally adopt following two kinds of modes to inquire about error correction:
1) based on user conversation (session), inquire about error correction, this inquiry error correcting system, mainly according to the session log of user search (session log), is excavated candidate's error correction pair that user initiatively rewrites, as the search word after error correction.
2) the escape probability error correction based on a large number of users record: the higher search daily record of this kind of inquiry error correcting system screening click volume is as correct candidate result collection, then, after search word (query) being converted, in candidate collection, search the search word that the most close conduct is correct.
Above two kinds of modes of inquiring about error correction, search word is being carried out to, in error correction procedure, have following shortcoming:
1) need to rely on a large amount of user records, in the situation that not having a large number of users record to support, will cause inquiring about error correction to search word;
2) for user view, comparatively disperse, large and complete situation, cannot focus on user's request a field.
Inquiry error correcting system for search word in correlation technique records because relying on a large number of users the lower problem of coverage rate causing, and not yet proposes at present effective solution.
Summary of the invention
The embodiment of the present invention provides a kind of error correction method and device of search word, the lower technical matters of coverage rate causing because relying on a large number of users record at least to solve the inquiry error correcting system of search word in prior art.
According to the embodiment of the present invention aspect, provide a kind of error correction method of search word.
According to the error correction method of the search word of the embodiment of the present invention, comprise: obtain search word, wherein, described search word is long-tail keyword; Split described search word, obtain a plurality of the first search word fragments; The first search word fragment described in each is carried out to error correction, obtain a plurality of the second search word fragments after error correction; Splice described a plurality of the second search word fragment, obtain candidate result; Judge whether described candidate result is the associated data in linked database, wherein, in described linked database, store the associated data after the error correction of many groups; And in the situation that to judge described candidate result be the associated data in described linked database, determine that described candidate result is for to carry out the target search word after error correction to described search word.
A kind of error correction device of search word is also provided according to the embodiment of the present invention on the other hand.
According to the error correction device of the search word of the embodiment of the present invention, comprise: acquiring unit, for obtaining search word, wherein, described search word is long-tail keyword; Split cells, for splitting described search word, obtains a plurality of the first search word fragments; Error correction unit, for the first search word fragment described in each is carried out to error correction, obtains a plurality of the second search word fragments after error correction; Concatenation unit, for splicing described a plurality of the second search word fragment, obtains candidate result; Judging unit, for judging whether described candidate result is the associated data of linked database, wherein, stores the associated data after the error correction of many groups in described linked database; And determining unit, in the situation that judge the associated data that described candidate result is described linked database, determine that described candidate result is for to carry out the target search word after error correction to described search word.
In embodiments of the present invention, adopt and obtain search word, wherein, described search word is long-tail keyword; Split described search word, obtain a plurality of the first search word fragments; The first search word fragment described in each is carried out to error correction, obtain a plurality of the second search word fragments after error correction; Splice described a plurality of the second search word fragment, obtain candidate result; Judge whether described candidate result is the associated data in linked database, wherein, in described linked database, store the associated data after the error correction of many groups; And in the situation that to judge described candidate result be the associated data in described linked database, determine that described candidate result is for to carry out the target search word after error correction to described search word.By the search word of user's input is obtained, whole string is cut into a plurality of a plurality of fragments with independent implication, each fragment is carried out to correction process, again the candidate result of each fragment is spliced, relation between last usage data is verified the candidate result of splicing, in the situation that being proved to be successful, determine that spliced candidate result is for to carry out the target search word after error correction to search word, this kind of error correcting system do not need to rely on user record, in the situation that not having a large number of users record to support, still can to search word, inquire about error correction by the mode in bid associated data storehouse, solved the lower problem of coverage rate that in prior art, the inquiry error correcting system of search word causes because relying on a large number of users record, realized and can not rely on the error correction of user record to search word, and then reached to improve search word has been carried out to the coverage rate of error correction and the effect of independence.
Accompanying drawing explanation
Accompanying drawing described herein is used to provide a further understanding of the present invention, forms the application's a part, and schematic description and description of the present invention is used for explaining the present invention, does not form inappropriate limitation of the present invention.In the accompanying drawings:
Fig. 1 is according to the hardware environment figure of the error correction method of the search word of the embodiment of the present invention; ;
Fig. 2 is according to the process flow diagram of the error correction method of the search word of the embodiment of the present invention;
Fig. 3 is according to the process flow diagram of the error correction method of the search word of further embodiment of this invention;
Fig. 4 is according to the schematic diagram of the error correction device of the search word of the embodiment of the present invention; And
Fig. 5 is the schematic diagram of server of error correction method of implementing the search word of the embodiment of the present invention.
Embodiment
In order to make those skilled in the art person understand better the present invention program, below in conjunction with the accompanying drawing in the embodiment of the present invention, technical scheme in the embodiment of the present invention is clearly and completely described, obviously, described embodiment is only the embodiment of a part of the present invention, rather than whole embodiment.Embodiment based in the present invention, those of ordinary skills, not making the every other embodiment obtaining under creative work prerequisite, should belong to the scope of protection of the invention.
It should be noted that, the term " first " in instructions of the present invention and claims and above-mentioned accompanying drawing, " second " etc. are for distinguishing similar object, and needn't be for describing specific order or precedence.The data that should be appreciated that such use suitably can exchanged in situation, so as embodiments of the invention described herein can with except diagram here or describe those order enforcement.In addition, term " comprises " and " having " and their any distortion, intention is to cover not exclusive comprising, for example, those steps or unit that the process that has comprised series of steps or unit, method, system, product or equipment are not necessarily limited to clearly list, but can comprise clearly do not list or for these processes, method, product or equipment intrinsic other step or unit.
Technical term related in the embodiment of the present invention is done to description below:
Inquiry error correction (Query Correct is called for short QC): the error burst of user's input is revised as to correct expression way;
Resource: the physical name in business, for example, in music: song, singer, MV, special edition etc.; In video: movie and television play, director, performer etc.;
Associated: resource between by certain connection, there is relation, for example: singer A has sung song B, relevant between A and B; The performer of performer C in certain movie and television play D, C exists associated with D;
Degree of confidence: also referred to as fiduciary level, or confidence level, confidence factor,, when sampling makes an estimate to population parameter, due to the randomness of sample, its conclusion is always uncertain.Therefore, adopt a kind of statement method of probability, the interval estimation method in mathematical statistics namely, estimated value and population parameter are in certain error range allowing, and its corresponding probability has much, and this corresponding probability is called degree of confidence;
Recall rate: (Recall Rate is also recall ratio) is the ratio of relevant documentation number all in the relevant documentation number that retrieves and document library, measurement be the recall ratio of searching system; Precision is the relevant documentation number retrieving and the ratio of the total number of documents retrieving, measurement be the precision ratio of searching system;
Target keyword: refer to website " leading " keyword of deciding through key word analysis, the target customer who generally refers to website products & services may be used for the keyword of search;
Long-tail keyword: non-target keyword on website but also can bring search flow keyword, be called long-tail keyword.The feature of long-tail keyword is long, and 2-3 word forms often, or even phrase, is present in content page, and the title except content pages, is also present in content.For example, target keyword is clothes, and its long-tail keyword can be Men's Wear, winter dress, outdoor exercises dress etc.
Embodiment 1
According to the embodiment of the present invention, a kind of embodiment of the method that can carry out by the application's device embodiment is provided, it should be noted that, in the step shown in the process flow diagram of accompanying drawing, can in the computer system such as one group of computer executable instructions, carry out, and, although there is shown logical order in flow process, in some cases, can carry out shown or described step with the order being different from herein.
According to the embodiment of the present invention, provide a kind of error correction method of search word.
Alternatively, in the present embodiment, the error correction method of above-mentioned search word can be applied in hardware environment that client 102 as shown in Figure 1 and server 104 form.As shown in Figure 1, client 102 is connected with server 104 by network, above-mentioned network includes but not limited to: wide area network, Metropolitan Area Network (MAN) or LAN (Local Area Network), client 102 can be cell-phone customer terminal, can be also pc client, notebook client or panel computer client.
Fig. 2 is according to the process flow diagram of the error correction method of the search word of the embodiment of the present invention, and as shown in Figure 2, the error correction method of this search word mainly comprises the steps that S201 is to step S211:
S201: obtain search word, wherein, search word is long-tail keyword, particularly, when user utilizes client 102 to search for, can be by the human-computer interaction interface input relevant search string of client 102, this search string is the search word for the treatment of error correction.Treat the obtaining of search word of error correction, both can obtain by dummy keyboard or the key board of monitoring client 102, also can by the search word of monitoring client 102, receive space and obtain.
S203: the search word for the treatment of error correction getting in splitting step S201, obtain a plurality of the first search word fragments,, the search word for the treatment of error correction is cut into a plurality of fragments with independent implication, the the first search word fragment that splits gained retains and is ready as far as possible, the search word for the treatment of error correction mixing for Chinese and English, is divided into different fragments by Chinese and English.
S205: each first search word fragment is carried out to error correction, obtain a plurality of the second search word fragments after error correction, particularly, can adopt phonetic error correction algorithm to carry out error correction to each first search word fragment, also can to each first search word fragment, carry out error correction based on user conversation, can also to each first search word fragment, carry out error correction according to editing distance algorithm, concrete correction process mode is identical with traditional correction process mode, no longer specifically introduces herein.
S207: splice a plurality of the second search word fragments, obtain candidate result, that is, be reassembled into new search word by a plurality of the first search word fragments being carried out to a plurality of the second search word fragments that error correction obtains, determine that this new search word is candidate result.
S209: judge whether candidate result is the associated data in linked database, wherein, store the associated data after the error correction of many groups in linked database, that is, whether the new search word that judgement reconfigures is a certain group of associated data in linked database.
S211: be the associated data in linked database in the situation that judge candidate result, determine that candidate result is that search word is carried out to the target search word after error correction.Due to what store in linked database, it is the associated data after some error correction, so, if spliced candidate result is the associated data in linked database, explanation can represent to treat by candidate result the real search object of the search word of error correction exactly, therefore, can determine that candidate result is that search word is carried out to the target search word after error correction.
The error correction method of the search word that the embodiment of the present invention provides, by the search word of user's input is obtained, whole string is cut into a plurality of a plurality of fragments with independent implication, each fragment is carried out to correction process, again the candidate result of each fragment is spliced, relation between last usage data is verified the candidate result of splicing, in the situation that being proved to be successful, determine that spliced candidate result is that the search word for the treatment of error correction carries out the target search word after error correction, this kind of error correcting system do not need to rely on user record, in the situation that not having a large number of users record to support, still can to search word, inquire about error correction by the mode in bid associated data storehouse, solved the lower problem of coverage rate that in prior art, the inquiry error correcting system of search word causes because relying on a large number of users record, realized and can not rely on the error correction of user record to search word, and then reached to improve search word has been carried out to the coverage rate of error correction and the effect of independence.
And, employing is cut into a plurality of a plurality of fragments with independent implication by whole string search word, each fragment is carried out to the processing mode of error correction, realized more meticulously search word has been carried out to error correction, for thinking before, do not need error correction, even people be all correct search word instinctively, also can carry out correction process, reached the effect that improves recall rate.In addition by utilizing, each fragment is carried out to error correction, and then the fragment after a plurality of error correction is spliced and obtained candidate result, realized and farthest provided an accurate object search word, make the object search word finally determined more clear and definite, reached the degree of accuracy that improves search word error correction.
Particularly, in embodiments of the present invention, both can split the search word for the treatment of error correction according to the maximum matching way of forward direction, obtain a plurality of the first search word fragments, also can split the search word for the treatment of error correction according to backward maximum matching way, obtain a plurality of the first search word fragments, wherein, the maximum matching way of forward direction is for splitting according to the first order the search word for the treatment of error correction, the first order is for the initial character of search word for the treatment of error correction is to the order of trailing character, backward maximum matching way is for splitting according to the second order the search word for the treatment of error correction, the second order is for the trailing character of search word for the treatment of error correction is to the order of initial character.Illustrate as follows:
According to the maximum matching way of forward direction, split and treat that the search word of error correction is exactly, treat from front to back the search word of error correction and get word, then going dictionary lookup to have does not have, according to the feedback result of searching, determine that whether participle is successful, if can find got word from dictionary database, participle success, without reducing this word, continues dictionary lookup.For example, for " I love Tian An-men, Beijing " this sentence, first get " I love Beijing " (the major term length of Chinese character is generally 4), go not find in dictionary, then remove " capital ", become " I like north ", do not find again, remove again " north ", become " I like ", be much to seek, remove " love ", only remained next " I ", " I " this individual character has been exactly a word so, divide what a word, continue to walk, get " liking sky, Beijing ", continue step above, obtain " love ", continue to get " Beijing Tian An ", the result obtaining is specifically " Beijing ", move backward two pointers, continue to get " Tian An-men ", if found " Tian An-men " in dictionary, participle finishes, the result that participle obtains is " I ", " love ", " Beijing " and " Tian An-men ".
According to backward maximum matching way, split and treat the search word of error correction and according to the maximum matching way fractionation of forward direction, treat that the search word of error correction is contrary process, the latter is that the search word for the treatment of from front to back error correction is got word, and the former is that the search word for the treatment of from back to front error correction is got word.Such as, for " I love Tian An-men, Beijing " the words, first get " Tian An-men, capital ", in dictionary, do not find, remove " capital ", surplus " Tian An-men ", finds, next get " I love Beijing ", continue participle, finally obtain split result " I ", " love ", " Beijing " and " Tian An-men ".
In embodiments of the present invention, by utilizing the maximum matching way of forward direction or backward maximum matching way to split the search word for the treatment of error correction, obtain a plurality of the first search word fragments, the search word that error correction is treated in realization splits fast, reached the error correction speed that improves search word, and this kind of fractionation mode have the simple advantage of fractionation, search word is carried out in split process to consumption systems internal memory less, reach raising travelling speed, and then further improved the error correction speed of search word.
In embodiments of the present invention, for certain first search word fragment, it is carried out to the second search word fragment that error correction obtains may be a plurality of, correspondingly, the second search word fragment is spliced, and the candidate result obtaining is a plurality of, for example:
Treat the search word of error correction: to I cigarette jiangyuqin
A first search word fragment after fractionation: give me cigarette
Another after fractionation the first search word fragment: jiangyuqin
Suppose the first search word fragment " to my cigarette " to carry out error correction, obtain two error correction results:
A first search word fragment after fractionation: give me cigarette
As a result 1: give me a cigarette
As a result 2: lend me a cigarette
The first search word fragment " jiangyuqin " is carried out to error correction, obtains equally two error correction results:
Another after fractionation the first search word fragment: jiangyuqin
As a result 1: Jiang Yuqin
As a result 2: Jiang Yuqing
The second search word fragment is spliced, the candidate result obtaining has four:
Candidate result 1: give me a cigarette Jiang Yu qin
Candidate result 2: give me a cigarette Jiang Yu clear
Candidate result 3: give me a cigarette Jiang Yu qin
Candidate result 4: borrow I cigarette Jiang Yu clear
Quantity for candidate result is a plurality of situations, if in a plurality of candidate result, only having wherein some candidate result is the associated data in linked database, and the search word that this candidate result is defined as treating error correction carries out the target search word after error correction.For above-mentioned example, if candidate result 1: " giving me a cigarette Jiang Yu qin " is the associated data in linked database, and candidate result 2: " giving me a cigarette Jiang Yu clear ", candidate result 3: " lending me a cigarette Jiang Yu qin " and candidate result 4: " clear by means of I cigarette Jiang Yu " is not all the associated data in linked database, determine candidate result 1: " giving me a cigarette Jiang Yu qin " carries out the target search word after error correction for treating the search word " to I cigarette jiangyuqin " of error correction.
If a plurality of candidate result are all the associated datas in linked database, in the situation that to judge candidate result be the associated data in linked database, determine that candidate result is to treat the target search word that the search word of error correction carries out after error correction to comprise the steps: first, obtain the temperature of each candidate result, wherein, temperature represents that candidate result is confirmed as the degree of target search word, and also, candidate result is confirmed as the resource usage degree of target search word; Then, determining the represented candidate result target search word of the highest temperature, that is, is all situations of the associated data in linked database for a plurality of candidate result, and the highest candidate result of one of them temperature is defined as to target search word.
Preferably, in embodiments of the present invention, in a plurality of the second search word fragments of splicing, obtain in candidate result process, can first obtain the relative sequencing between a plurality of the first search word fragments, wherein, sequencing is that the search word for the treatment of error correction splits the order producing relatively; Then, according to the relative sequencing getting, splice a plurality of the second search word fragments, obtain candidate result.
Such as, the search word " to I cigarette jiangyuqin " for the treatment of error correction splits, obtain the first search word fragment " to my cigarette " and the first search word fragment " jiangyuqin ", relative sequencing before the two be the first search word fragment " to my cigarette " in the first search word fragment " jiangyuqin " before, the first search word fragment " jiangyuqin " " is given me a cigarette " afterwards in the first search word fragment, in splicing, to the first search word fragment, " give me a cigarette " and carry out the second search word fragment " to my cigarette " after error correction or " to my cigarette " and the first search word fragment " jiangyuqin " carried out in the second search word fragment " Jiang Yuqin " or " Jiang Yuqing " process after error correction, by the second search word fragment, " give me a cigarette " or " to my cigarette " is placed on the second search word fragment " Jiang Yuqin " or " Jiang Yuqing " before.
By obtaining the relative sequencing between the first search word fragment, and according to the relative sequencing getting, the second search word fragment is spliced, obtain candidate result, realized and guaranteed that target search word has identical order with former search word, avoid excessive to former search word error correction, reached the effect that guarantees error correction degree of accuracy.
Further, in embodiments of the present invention, linked database can be the database of setting up based on default resource, foundation for this database can be carried out before obtaining the search word for the treatment of error correction, that is, the error correction method of the search word of the embodiment of the present invention also comprises: before obtaining the search word for the treatment of error correction, based on default resource, set up linked database, wherein, default resource comprises the attribute of the search word for the treatment of error correction.Such as, the search word for the treatment of error correction is the word that carries out music searching, the default resource adopting while setting up linked database can be the resource in music field, such as being the resource that comprises song, it can be the resource that comprises singer, can be the resource that comprises MV, can be also the resource that comprises special edition etc.
Particularly, in embodiments of the present invention, the quantity of default resource can be for a plurality of, need a plurality of default resources to associate setting up linked database process, in embodiments of the present invention, can first to each default resource, convert, then the default resource based on after conversion is set up linked database, take below and preset resource and comprise that the first default resource and the second default resource are example, illustrate the mode of setting up linked database based on default resource:
Mode one:
First, split the character string in the first default resource, obtain a plurality of first candidate's strings, each first candidate's string all can be used as the index of the first default resource, and splits the character string in the second default resource, obtains a plurality of second candidate's strings, each second candidate's string all can be used as the index of the second default resource, wherein, character string is once split, can obtain a series of radiuses and go here and there with interior candidate in same range.For example:
Certain character string is abcde, according to the mode of removing a character, this character string is split, can obtain a series of radiuses and be candidate's string of 1, according to the mode of removing two characters, this character string is split, can obtain a series of radiuses and be candidate's string of 2, as shown in table 1:
Table 1
Radius is 1 Radius is 2
Index1:bcde Index1:cde
Index2:abde Index2:ade
Index3:abce Index3:abe
Index4:abcd Index4:abc
Secondly, from a plurality of first candidate's strings, extract first object candidate string, and extract the second target candidate string from a plurality of second candidate's strings, wherein, first object candidate string and the second target candidate string are candidate's string with incidence relation.
Then, the incidence relation between first object candidate string, the second target candidate string and first object candidate string and the second target candidate string is stored as to the associated data in linked database., the candidate in two default resources with incidence relation is gone here and there to corresponding stored, form one and there is the linked database that is related to index between resource, such as, a first object candidate string is " where the time has all gone ", a second target candidate string is " Wang Zhengliang ", incidence relation between first object candidate string and the second target candidate string is: " where the time has all gone " is the song that " Wang Zhengliang " sings, first object candidate is gone here and there, the associated data that incidence relation between the second target candidate string and first object candidate string and the second target candidate string is stored as in linked database can be specifically:
Doc1: where the time has all gone
Doc2: Wang Zhengliang
Relation: Doc1 is the song that Doc2 sings
Mode two:
First, split the character string in the first default resource, obtain a plurality of first candidate's strings, each first candidate's string all can be used as the index of the first default resource, and splits the character string in the second default resource, obtains a plurality of second candidate's strings, each second candidate's string all can be used as the index of the second default resource, wherein, character string is once split, can obtain a series of radiuses and go here and there with interior candidate in same range.For example: certain character string is abcde, according to the mode of removing a character, this character string is split, can obtain a series of radiuses and be candidate's string of 1, according to the mode of removing two characters, this character string is split, can obtain a series of radiuses and be candidate's string of 2, as shown in Table 1.
Secondly, from a plurality of first candidate's strings, extract first object candidate string, and extract the second target candidate string from a plurality of second candidate's strings, wherein, first object candidate string and the second target candidate string are candidate's string with incidence relation.
Then, splicing first object candidate string and the second target candidate string.
Again then, determine that first object candidate goes here and there and the splicing result of the second target candidate string is the associated data in association index database.
, candidate's string in two default resources with incidence relation is stitched together, form a new resource, and using the resource of this new formation the associated data in linked database, such as, the first object candidate string with the second target candidate string " Wang Zhengliang " with incidence relation comprises " where the time has all gone ", " all where gone ", " time all where gone ", " where the time has gone ", " time all where " etc., splicing first object candidate's string and the second target candidate string, the splicing result of determining again first object candidate string and the second target candidate string is that the associated data in association index database can be specifically:
Index1: where the time has all removed Wang Zhengliang
Index2: all where removed Wang Zhengliang
Index3: time all where removed Wang Zhengliang
Index4: where the time has removed Wang Zhengliang
Index5: the time all where Wang Zhengliang
……
Fig. 3 is according to the process flow diagram of the error correction method of the search word of the embodiment of the present invention, and as shown in Figure 3, the principle that the linked database based on setting up carries out error correction to search word is:
First, obtain the search word for the treatment of error correction, this search word is long-tail keyword.
Secondly, the search word for the treatment of error correction splits, and obtains a plurality of the first search word fragments.
Then, bid associated data storehouse, judge the associated data that whether stores a plurality of the first search word fragments in linked database, if any, can from linked database, directly obtain error correction result, if it's not true, need each the first search word fragment to carry out error correction, wherein, many each first search word fragments are carried out error correction and can be called resource database and carry out, concrete wrong mode can adopt of the prior art any to carry out the mode of fragment error correction, in embodiments of the present invention, this error correcting system is no longer specifically repeated.
Again then, the fragment after error correction is spliced, obtain candidate result, and again call linked database, candidate result is carried out to verification, the search word that finally obtains treating error correction carries out the target search word after error correction.
It should be noted that, for aforesaid each embodiment of the method, for simple description, therefore it is all expressed as to a series of combination of actions, but those skilled in the art should know, the present invention is not subject to the restriction of described sequence of movement, because according to the present invention, some step can adopt other orders or carry out simultaneously.Secondly, those skilled in the art also should know, the embodiment described in instructions all belongs to preferred embodiment, and related action and module might not be that the present invention is necessary.
Through the above description of the embodiments, those skilled in the art can be well understood to the mode that can add essential general hardware platform by software according to the method for above-described embodiment and realize, can certainly pass through hardware, but in a lot of situation, the former is better embodiment.Understanding based on such, the part that technical scheme of the present invention contributes to prior art in essence in other words can embody with the form of software product, this computer software product is stored in a storage medium (as ROM/RAM, magnetic disc, CD), comprise that some instructions are with so that a station terminal equipment (can be mobile phone, computing machine, server, or the network equipment etc.) carry out the method described in each embodiment of the present invention.
Embodiment 2
According to the embodiment of the present invention, also provide a kind of for implementing the error correction device of search word of the error correction method of above-mentioned search word, the error correction device of this search word is mainly used in carrying out the error correction method of the search word that embodiment of the present invention foregoing provides, below the error correction device of search word that the embodiment of the present invention is provided do concrete introduction:
Fig. 4 is according to the schematic diagram of the error correction device of the search word of the embodiment of the present invention, as shown in Figure 4, the error correction device of this search word mainly comprises acquiring unit 10, split cells 20, error correction unit 30, concatenation unit 40, judging unit 50 and determining unit 60, wherein:
Acquiring unit 10 is for obtaining search word, and wherein, search word is long-tail keyword, particularly, when user utilizes client 102 to search for, can be by the human-computer interaction interface input relevant search string of client 102, this search string is the search word for the treatment of error correction.Treat the obtaining of search word of error correction, both can obtain by dummy keyboard or the key board of monitoring client 102, also can by the search word of monitoring client 102, receive space and obtain.
The search word for the treatment of error correction that split cells 20 gets for splitting acquiring unit 10, obtain a plurality of the first search word fragments,, the search word for the treatment of error correction is cut into a plurality of fragments with independent implication, the the first search word fragment that splits gained retains and is ready as far as possible, the search word for the treatment of error correction mixing for Chinese and English, is divided into different fragments by Chinese and English.
Error correction unit 30 is for carrying out error correction to each first search word fragment, obtain a plurality of the second search word fragments after error correction, particularly, can adopt phonetic error correction algorithm to carry out error correction to each first search word fragment, also can to each first search word fragment, carry out error correction based on user conversation, can also to each first search word fragment, carry out error correction according to editing distance algorithm, concrete correction process mode is identical with traditional correction process mode, no longer specifically introduces herein.
Concatenation unit 40, for splicing a plurality of the second search word fragments, obtains candidate result,, by a plurality of the first search word fragments being carried out to a plurality of the second search word fragments that error correction obtains, is reassembled into new search word that is, determines that this new search word is candidate result.
Judging unit 50 is for judging whether candidate result is the associated data of linked database, wherein, in linked database, store the associated data after the error correction of many groups, that is, whether the new search word that judgement reconfigures is a certain group of associated data in linked database.
Determining unit 60, in the situation that judge the associated data that candidate result is linked database, determines that candidate result is that search word is carried out to the target search word after error correction.Due to what store in linked database, it is the associated data after some error correction, so, if spliced candidate result is the associated data in linked database, explanation can represent to treat by candidate result the real search object of the search word of error correction exactly, therefore, can determine that candidate result is that search word is carried out to the target search word after error correction.
The error correction device of the search word that the embodiment of the present invention provides, by the search word of user's input is obtained, whole string is cut into a plurality of a plurality of fragments with independent implication, each fragment is carried out to correction process, again the candidate result of each fragment is spliced, relation between last usage data is verified the candidate result of splicing, in the situation that being proved to be successful, determine that spliced candidate result is that the search word for the treatment of error correction carries out the target search word after error correction, this kind of error correcting system do not need to rely on user record, in the situation that not having a large number of users record to support, still can to search word, inquire about error correction by the mode in bid associated data storehouse, solved the lower problem of coverage rate that in prior art, the inquiry error correcting system of search word causes because relying on a large number of users record, realized and can not rely on the error correction of user record to search word, and then reached to improve search word has been carried out to the coverage rate of error correction and the effect of independence.
And, employing is cut into a plurality of a plurality of fragments with independent implication by whole string, each fragment is carried out to the processing mode of error correction, realized more meticulously search word has been carried out to error correction, for thinking before, do not need error correction, even people be all correct search word instinctively, also can carry out correction process, reached the effect that improves recall rate.In addition by utilizing, each fragment is carried out to error correction, and then the fragment after a plurality of error correction is spliced and obtained candidate result, realized and farthest provided an accurate object search word, make the object search word finally determined more clear and definite, reached the degree of accuracy that improves search word error correction.
Particularly, in embodiments of the present invention, split cells 20 can comprise that the first fractionation module and second splits module, wherein, first splits module for splitting according to the maximum matching way of forward direction the search word for the treatment of error correction, obtain a plurality of the first search word fragments, second splits module for splitting according to backward maximum matching way the search word for the treatment of error correction, obtain a plurality of the first search word fragments, the maximum matching way of forward direction is for splitting according to the first order the search word for the treatment of error correction, the first order is for the initial character of search word for the treatment of error correction is to the order of trailing character, backward maximum matching way is for splitting according to the second order the search word for the treatment of error correction, the second order is for the trailing character of search word for the treatment of error correction is to the order of initial character.
Illustrate as follows:
According to the maximum matching way of forward direction, split and treat that the search word of error correction is exactly, treat from front to back the search word of error correction and get word, then going dictionary lookup to have does not have, according to the feedback result of searching, determine that whether participle is successful, if can find got word from dictionary database, participle success, without reducing this word, continues dictionary lookup.For example, for " I love Tian An-men, Beijing " this sentence, first get " I love Beijing " (the major term length of Chinese character is generally 4), go not find in dictionary, then remove " capital ", become " I like north ", do not find again, remove again " north ", become " I like ", be much to seek, remove " love ", only remained next " I ", " I " this individual character has been exactly a word so, divide what a word, continue to walk, get " liking sky, Beijing ", continue step above, obtain " love ", continue to get " Beijing Tian An ", the result obtaining is specifically " Beijing ", move backward two pointers, continue to get " Tian An-men ", if found " Tian An-men " in dictionary, participle finishes, the result that participle obtains is " I ", " love ", " Beijing " and " Tian An-men ".
According to backward maximum matching way, split and treat the search word of error correction and according to the maximum matching way fractionation of forward direction, treat that the search word of error correction is contrary process, the latter is that the search word for the treatment of from front to back error correction is got word, and the former is that the search word for the treatment of from back to front error correction is got word.Such as, for " I love Tian An-men, Beijing " the words, first get " Tian An-men, capital ", in dictionary, do not find, remove " capital ", surplus " Tian An-men ", finds, next get " I love Beijing ", continue participle, finally obtain split result " I ", " love ", " Beijing " and " Tian An-men ".
In embodiments of the present invention, by utilizing the maximum matching way of forward direction or backward maximum matching way to split the search word for the treatment of error correction, obtain a plurality of the first search word fragments, the search word that error correction is treated in realization splits fast, reached the error correction speed that improves search word, and this kind of fractionation mode have the simple advantage of fractionation, search word is carried out in split process to consumption systems internal memory less, reach raising travelling speed, and then further improved the error correction speed of search word.
In embodiments of the present invention, for certain first search word fragment, it may be a plurality of that error correction unit 30 carries out to it the second search word fragment that error correction obtains, correspondingly, 40 pairs of the second search word fragments of concatenation unit are spliced, and the candidate result obtaining is a plurality of, for example:
Treat the search word of error correction: to I cigarette jiangyuqin
A first search word fragment after fractionation: give me cigarette
Another after fractionation the first search word fragment: jiangyuqin
Suppose the first search word fragment " to my cigarette " to carry out error correction, obtain two error correction results:
A first search word fragment after fractionation: give me cigarette
As a result 1: give me a cigarette
As a result 2: lend me a cigarette
The first search word fragment " jiangyuqin " is carried out to error correction, obtains equally two error correction results:
Another after fractionation the first search word fragment: jiangyuqin
As a result 1: Jiang Yuqin
As a result 2: Jiang Yuqing
The second search word fragment is spliced, the candidate result obtaining has four:
Candidate result 1: give me a cigarette Jiang Yu qin
Candidate result 2: give me a cigarette Jiang Yu clear
Candidate result 3: give me a cigarette Jiang Yu qin
Candidate result 4: borrow I cigarette Jiang Yu clear
Quantity for candidate result is a plurality of situations, if in a plurality of candidate result, only having wherein some candidate result is the associated data in linked database, and the search word that determining unit 60 is defined as treating error correction by this candidate result carries out the target search word after error correction.For above-mentioned example, if candidate result 1: " giving me a cigarette Jiang Yu qin " is the associated data in linked database, and candidate result 2: " giving me a cigarette Jiang Yu clear ", candidate result 3: " lending me a cigarette Jiang Yu qin " and candidate result 4: " clear by means of I cigarette Jiang Yu " is not all the associated data in linked database, determine candidate result 1: " giving me a cigarette Jiang Yu qin " carries out the target search word after error correction for treating the search word " to I cigarette jiangyuqin " of error correction.
If a plurality of candidate result are all the associated datas in linked database, determining unit 60 need to be determined target search word by the first acquisition module and the second determination module, wherein, the first acquisition module is for obtaining the temperature of each candidate result, wherein, temperature represents that candidate result is confirmed as the degree of target search word, and also, candidate result is confirmed as the resource usage degree of target search word; The first determination module for determining the represented candidate result target search word of the highest temperature, that is, is all situations of the associated data in linked database for a plurality of candidate result, and the highest candidate result of one of them temperature is defined as to target search word.
Preferably, concatenation unit 40 mainly comprises the second acquisition module and the first concatenation module, and wherein, the second acquisition module is for obtaining the relative sequencing between a plurality of the first search word fragments, wherein, sequencing is that the search word for the treatment of error correction splits the order producing relatively; The first concatenation module, for splicing a plurality of the second search word fragments according to relative sequencing, obtains candidate result.
Such as, the search word " to I cigarette jiangyuqin " for the treatment of error correction splits, obtain the first search word fragment " to my cigarette " and the first search word fragment " jiangyuqin ", relative sequencing before the two be the first search word fragment " to my cigarette " in the first search word fragment " jiangyuqin " before, the first search word fragment " jiangyuqin " " is given me a cigarette " afterwards in the first search word fragment, in splicing, to the first search word fragment, " give me a cigarette " and carry out the second search word fragment " to my cigarette " after error correction or " to my cigarette " and the first search word fragment " jiangyuqin " carried out in the second search word fragment " Jiang Yuqin " or " Jiang Yuqing " process after error correction, by the second search word fragment, " give me a cigarette " or " to my cigarette " is placed on the second search word fragment " Jiang Yuqin " or " Jiang Yuqing " before.
By obtaining the relative sequencing between the first search word fragment, and according to the relative sequencing getting, the second search word fragment is spliced, obtain candidate result, realized and guaranteed that target search word has identical order with former search word, avoid excessive to former search word error correction, reached the effect that guarantees error correction degree of accuracy.
Further, the error correction device of the search word that the embodiment of the present invention provides also comprises sets up unit, before obtaining the search word for the treatment of error correction, sets up unit for setting up linked database based on default resource at acquiring unit 10, wherein, default resource comprises the attribute of the search word for the treatment of error correction.Such as, the search word for the treatment of error correction is the word that carries out music searching, the default resource adopting while setting up linked database can be the resource in music field, such as being the resource that comprises song, it can be the resource that comprises singer, can be the resource that comprises MV, can be also the resource that comprises special edition etc.
Particularly, in embodiments of the present invention, the quantity of default resource can be for a plurality of, need a plurality of default resources to associate setting up linked database process, in embodiments of the present invention, can first to each default resource, convert, then the default resource based on after conversion is set up linked database, take below and preset resource and comprise that the first default resource and the second default resource are example, illustrate the structure of setting up unit and form:
Structure one: set up unit and mainly comprise that the 3rd splits module, the 4th fractionation module, the first extraction module and the first memory module, particularly, the 3rd splits module for splitting the character string of the first default resource, obtain a plurality of first candidate's strings, each first candidate's string all can be used as the index of the first default resource; The 4th splits module for splitting the character string of the second default resource, obtains a plurality of second candidate's strings, and each second candidate's string all can be used as the index of the second default resource.For example:
Certain character string is abcde, according to the mode of removing a character, this character string is split, can obtain a series of radiuses and be candidate's string of 1, according to the mode of removing two characters, this character string is split, can obtain a series of radiuses and be candidate's string of 2, as shown in Table 1.
The first extraction module is gone here and there for extracting first object candidate from a plurality of first candidate's strings, and extracts the second target candidate string from a plurality of second candidates' strings, and wherein, first object candidate string and the second target candidate string are candidate's string with incidence relation.
The first memory module is for being stored as the incidence relation between first object candidate string, the second target candidate string and first object candidate string and the second target candidate string the associated data of linked database., the candidate in two default resources with incidence relation is gone here and there to corresponding stored, form one and there is the linked database that is related to index between resource, such as, a first object candidate string is " where the time has all gone ", a second target candidate string is " Wang Zhengliang ", incidence relation between first object candidate string and the second target candidate string is: " where the time has all gone " is the song that " Wang Zhengliang " sings, first object candidate is gone here and there, the associated data that incidence relation between the second target candidate string and first object candidate string and the second target candidate string is stored as in linked database can be specifically:
Doc1: where the time has all gone
Doc2: Wang Zhengliang
Relation: Doc1 is the song that Doc2 sings
Structure two: set up unit and mainly comprise that the 5th splits module, the 6th fractionation module, the second extraction module, the second concatenation module and the second determination module, particularly,
The 5th splits module for splitting the character string of the first default resource, obtains a plurality of first candidate's strings, and each first candidate's string all can be used as the index of the first default resource; The 6th splits module for splitting the character string of the second default resource, obtain a plurality of second candidate's strings, each second candidate's string all can be used as the index of the second default resource, wherein, character string is once split, can obtain a series of radiuses and go here and there with interior candidate in same range.For example: certain character string is abcde, according to the mode of removing a character, this character string is split, can obtain a series of radiuses and be candidate's string of 1, according to the mode of removing two characters, this character string is split, can obtain a series of radiuses and be candidate's string of 2, as shown in Table 1.
The second extraction module is gone here and there for extracting first object candidate from a plurality of first candidate's strings, and extracts the second target candidate string from a plurality of second candidates' strings, and wherein, first object candidate string and the second target candidate string are candidate's string with incidence relation.
The second concatenation module is used for splicing first object candidate string and the second target candidate string.
The second determination module is for determining that the splicing result of first object candidate string and the second target candidate string is the associated data of association index database.
, candidate's string in two default resources with incidence relation is stitched together, form a new resource, and using the resource of this new formation the associated data in linked database, such as, the first object candidate string with the second target candidate string " Wang Zhengliang " with incidence relation comprises " where the time has all gone ", " all where gone ", " time all where gone ", " where the time has gone ", " time all where " etc., splicing first object candidate's string and the second target candidate string, the splicing result of determining again first object candidate string and the second target candidate string is that the associated data in association index database can be specifically:
Index1: where the time has all removed Wang Zhengliang
Index2: all where removed Wang Zhengliang
Index3: time all where removed Wang Zhengliang
Index4: where the time has removed Wang Zhengliang
Index5: the time all where Wang Zhengliang
……
From above description, can find out, the present invention has realized does not need to rely on user record, in the situation that not having a large number of users record to support, still can to search word, inquire about error correction by the mode in bid associated data storehouse, and then reach to improve search word is carried out to the coverage rate of error correction and the effect of independence.
Embodiment 3
According to the embodiment of the present invention, also provide a kind of for implementing the server of the error correction method of above-mentioned search word, as shown in Figure 5, this server mainly comprises processor 601, equipment interface 602, network interface 603, database 604 and storer 605, wherein:
Storer 605 is mainly used in storing the program code of the error correction method of above-mentioned search word, and the ephemeral data in error correction procedure is carried out to buffer memory.
Database 604 users store needs the database that calls in the error correction method implementation of above-mentioned search word.
Equipment interface 602 is mainly used in being connected client with network interface 603.
Processor 601 is mainly used in carrying out following operation:
Obtain search word, wherein, search word is long-tail keyword; Split search word, obtain a plurality of the first search word fragments; Each first search word fragment is carried out to error correction, obtain a plurality of the second search word fragments; Splice a plurality of the second search word fragments, obtain candidate result; Judge whether candidate result is the associated data in linked database, wherein, in linked database, store the associated data after the error correction of many groups; And in the situation that to judge candidate result be the associated data in linked database, determine that candidate result is that search word is carried out to the target search word after error correction.
Processor 601 is also for splitting search word according to the maximum matching way of forward direction, obtain a plurality of the first search word fragments, wherein, the maximum matching way of forward direction is for splitting according to the first order the search word for the treatment of error correction, and the first order is for the initial character of search word for the treatment of error correction is to the order of trailing character.
Processor 601 is also for splitting according to backward maximum matching way the search word for the treatment of error correction, obtain a plurality of the first search word fragments, wherein, backward maximum matching way is for splitting according to the second order the search word treat error correction, and the second order is for the trailing character of search word for the treatment of error correction is to the order of initial character.
Processor 601 is also for obtaining the temperature of each candidate result, and wherein, temperature represents that candidate result is confirmed as the degree of target search word; And definite represented candidate result target search word of the highest temperature.
Processor 601 is also for obtaining the relative sequencing between a plurality of the first search word fragments, and wherein, sequencing is that the search word for the treatment of error correction splits the order producing relatively; And splice a plurality of the second search word fragments according to relative sequencing, obtain candidate result.
Processor 601 is also for setting up linked database based on default resource, and wherein, default resource comprises the attribute of the search word for the treatment of error correction.
Processor 601, also for splitting the character string of the first default resource, obtains a plurality of first candidate's strings; Split the character string in the second default resource, obtain a plurality of second candidate's strings; From a plurality of first candidate's strings, extract first object candidate string, and extract the second target candidate string from a plurality of second candidate's strings, wherein, first object candidate string and the second target candidate string are candidate's string with incidence relation; And the incidence relation between first object candidate string, the second target candidate string and first object candidate string and the second target candidate string is stored as to the associated data in linked database.
Processor 601, also for splitting the character string of the first default resource, obtains a plurality of first candidate's strings; Split the character string in the second default resource, obtain a plurality of second candidate's strings; From a plurality of first candidate's strings, extract first object candidate string, and extract the second target candidate string from a plurality of second candidate's strings, wherein, first object candidate string and the second target candidate string are candidate's string with incidence relation; Splicing first object candidate's string and the second target candidate string; And the splicing result of definite first object candidate string and the second target candidate string is the associated data in association index database.
Alternatively, the concrete example in the present embodiment can be with reference to the example described in above-described embodiment 1 and embodiment 2, and the present embodiment does not repeat them here.
Embodiment 4
Embodiments of the invention also provide a kind of storage medium.Alternatively, in the present embodiment, above-mentioned storage medium can be for storing the program code of the error correction method of embodiment of the present invention search word.
Alternatively, in the present embodiment, above-mentioned storage medium can be arranged at least one network equipment of a plurality of network equipments of the network of wide area network, Metropolitan Area Network (MAN) or LAN (Local Area Network).
Alternatively, in the present embodiment, storage medium is set to storage for carrying out the program code of following steps:
S1, obtains search word, and wherein, search word is long-tail keyword;
S2, splits word to be searched, obtains a plurality of the first search word fragments;
S3, carries out error correction to each first search word fragment, obtains a plurality of the second search word fragments.
S4, splices a plurality of the second search word fragments, obtains candidate result;
S5, judges whether candidate result is the associated data in linked database, wherein, stores the associated data after the error correction of many groups in linked database;
S6, is the associated data in linked database in the situation that judge candidate result, determines that candidate result is that search word is carried out to the target search word after error correction.
Alternatively, in the present embodiment, above-mentioned storage medium can include but not limited to: USB flash disk, ROM (read-only memory) (ROM, Read-Only Memory), the various media that can be program code stored such as random access memory (RAM, Random Access Memory), portable hard drive, magnetic disc or CD.
Alternatively, in the present embodiment, processor is carried out according to the program code of having stored in storage medium: according to the maximum matching way of forward direction, split the search word for the treatment of error correction, obtain a plurality of the first search word fragments, wherein, the maximum matching way of forward direction is for splitting according to the first order the search word for the treatment of error correction, and the first order is for the initial character of search word for the treatment of error correction is to the order of trailing character.
Alternatively, in the present embodiment, processor is carried out according to the program code of having stored in storage medium: according to backward maximum matching way, split the search word for the treatment of error correction, obtain a plurality of the first search word fragments, wherein, backward maximum matching way is for splitting according to the second order the search word treat error correction, and the second order is for the trailing character of search word for the treatment of error correction is to the order of initial character.
Alternatively, in the present embodiment, processor is carried out according to the program code of having stored in storage medium: obtain the temperature of each candidate result, wherein, temperature represents that candidate result is confirmed as the degree of target search word; And definite represented candidate result target search word of the highest temperature.
Alternatively, in the present embodiment, processor is carried out according to the program code of having stored in storage medium: obtain the relative sequencing between a plurality of the first search word fragments, wherein, sequencing is that the search word for the treatment of error correction splits the order producing relatively; And splice a plurality of the second search word fragments according to relative sequencing, obtain candidate result.
Alternatively, in the present embodiment, processor is carried out according to the program code of having stored in storage medium: based on default resource, set up linked database, wherein, default resource comprises the attribute of the search word for the treatment of error correction.
Alternatively, in the present embodiment, processor is carried out according to the program code of having stored in storage medium: split the character string in the first default resource, obtain a plurality of first candidate's strings; Split the character string in the second default resource, obtain a plurality of second candidate's strings; From a plurality of first candidate's strings, extract first object candidate string, and extract the second target candidate string from a plurality of second candidate's strings, wherein, first object candidate string and the second target candidate string are candidate's string with incidence relation; And the incidence relation between first object candidate string, the second target candidate string and first object candidate string and the second target candidate string is stored as to the associated data in linked database.
Alternatively, in the present embodiment, processor is carried out according to the program code of having stored in storage medium: split the character string in the first default resource, obtain a plurality of first candidate's strings; Split the character string in the second default resource, obtain a plurality of second candidate's strings; From a plurality of first candidate's strings, extract first object candidate string, and extract the second target candidate string from a plurality of second candidate's strings, wherein, first object candidate string and the second target candidate string are candidate's string with incidence relation; Splicing first object candidate's string and the second target candidate string; And the splicing result of definite first object candidate string and the second target candidate string is the associated data in association index database.
Alternatively, the concrete example in the present embodiment can be with reference to the example described in above-described embodiment 1 and embodiment 2, and the present embodiment does not repeat them here.
The invention described above embodiment sequence number, just to describing, does not represent the quality of embodiment.
If the form of SFU software functional unit of usining the integrated unit in above-described embodiment realizes and during as production marketing independently or use, can be stored in the storage medium of above-mentioned embodied on computer readable.Understanding based on such, the all or part of of the part that technical scheme of the present invention contributes to prior art in essence in other words or this technical scheme can embody with the form of software product, this computer software product is stored in storage medium, comprises that some instructions are with so that one or more computer equipment (can be personal computer, server or the network equipment etc.) is carried out all or part of step of method described in each embodiment of the present invention.
In the above embodiment of the present invention, the description of each embodiment is all emphasized particularly on different fields, in certain embodiment, there is no the part of detailed description, can be referring to the associated description of other embodiment.
In the several embodiment that provide in the application, should be understood that disclosed client can realize by another way.Wherein, device embodiment described above is only schematic, the for example division of described unit, be only that a kind of logic function is divided, during actual realization, can there is other dividing mode, for example a plurality of unit or assembly can in conjunction with or can be integrated into another system, or some features can ignore, or do not carry out.Another point, shown or discussed coupling each other or direct-coupling or communication connection can be by some interfaces, the indirect coupling of unit or module or communication connection can be electrical or other form.
The described unit as separating component explanation can or can not be also physically to separate, and the parts that show as unit can be or can not be also physical locations, can be positioned at a place, or also can be distributed in a plurality of network element.Can select according to the actual needs some or all of unit wherein to realize the object of the present embodiment scheme.
In addition, each functional unit in each embodiment of the present invention can be integrated in a processing unit, can be also that the independent physics of unit exists, and also can be integrated in a unit two or more unit.Above-mentioned integrated unit both can adopt the form of hardware to realize, and also can adopt the form of SFU software functional unit to realize.
The above is only the preferred embodiment of the present invention; it should be pointed out that for those skilled in the art, under the premise without departing from the principles of the invention; can also make some improvements and modifications, these improvements and modifications also should be considered as protection scope of the present invention.

Claims (16)

1. an error correction method for search word, is characterized in that, comprising:
Obtain search word;
Split described search word, obtain a plurality of the first search word fragments;
Described the first search word fragment is carried out to error correction, obtain a plurality of the second search word fragments;
Splice described a plurality of the second search word fragment, obtain candidate result;
Judge whether described candidate result is the associated data in linked database, wherein, in described linked database, store the associated data after the error correction of many groups; And
In the situation that judge described candidate result, be the associated data in described linked database, determine that described candidate result is for to carry out the target search word after error correction to described search word.
2. error correction method according to claim 1, is characterized in that, splits described search word, obtains a plurality of the first search word fragments and comprises:
According to the maximum matching way of forward direction, split described search word, obtain described a plurality of the first search word fragment, wherein, the maximum matching way of described forward direction is for splitting described search word according to the first order, and described the first order is that the initial character of described search word is to the order of trailing character.
3. error correction method according to claim 1, is characterized in that, splits described search word, obtains a plurality of the first search word fragments and comprises:
According to backward maximum matching way, split described search word, obtain described a plurality of the first search word fragment, wherein, described backward maximum matching way is for splitting described search word according to the second order, and described the second order is that the trailing character of described search word is to the order of initial character.
4. error correction method according to claim 1, it is characterized in that, the quantity of described candidate result is a plurality of, in the situation that judge described candidate result, be the associated data in described linked database, determine that described candidate result is that the target search word that described search word is carried out after error correction comprises:
The temperature of obtaining candidate result described in each, wherein, described temperature represents that described candidate result is confirmed as the degree of described target search word; And
Determine target search word described in the represented candidate result of the highest temperature.
5. error correction method according to claim 1, is characterized in that, splices described a plurality of the second search word fragment, obtains candidate result and comprises:
Obtain the relative sequencing between described a plurality of the first search word fragment, wherein, described relative sequencing is for splitting the order producing to described search word; And
According to described a plurality of the second search word fragments of described relative sequencing splicing, obtain described candidate result.
6. error correction method according to claim 1, is characterized in that, before obtaining search word, described error correction method also comprises:
Based on default resource, set up described linked database, wherein, described default resource comprises the attribute of described search word.
7. error correction method according to claim 6, is characterized in that, described default resource comprises the first default resource and the second default resource, sets up described linked database comprise based on default resource:
Split the character string in the described first default resource, obtain a plurality of first candidate's strings;
Split the character string in the described second default resource, obtain a plurality of second candidate's strings;
From described a plurality of first candidate's strings, extract first object candidate string, and extract the second target candidate string from described a plurality of second candidate's strings, wherein, described first object candidate string and described the second target candidate string are candidate's string with incidence relation; And
Incidence relation between described first object candidate string, described the second target candidate string and described first object candidate string and described the second target candidate string is stored as to the associated data in described linked database.
8. error correction method according to claim 6, is characterized in that, described default resource comprises the first default resource and the second default resource, sets up described linked database comprise based on default resource:
Split the character string in the described first default resource, obtain a plurality of first candidate's strings;
Split the character string in the described second default resource, obtain a plurality of second candidate's strings;
From described a plurality of first candidate's strings, extract first object candidate string, and extract the second target candidate string from described a plurality of second candidate's strings, wherein, described first object candidate string and described the second target candidate string are candidate's string with incidence relation;
Splice described first object candidate string and described the second target candidate string; And
The splicing result of determining described first object candidate string and described the second target candidate string is the associated data in described linked database.
9. an error correction device for search word, is characterized in that, comprising:
Acquiring unit, for obtaining search word;
Split cells, for splitting described search word, obtains a plurality of the first search word fragments;
Error correction unit, for described the first search word fragment is carried out to error correction, obtains a plurality of the second search word fragments;
Concatenation unit, for splicing described a plurality of the second search word fragment, obtains candidate result;
Judging unit, for judging whether described candidate result is the associated data of linked database, wherein, stores the associated data after the error correction of many groups in described linked database; And
Determining unit, in the situation that judge the associated data that described candidate result is described linked database, determines that described candidate result is for to carry out the target search word after error correction to described search word.
10. error correction device according to claim 9, is characterized in that, described split cells comprises:
First splits module, for splitting described search word according to the maximum matching way of forward direction, obtain described a plurality of the first search word fragment, wherein, the maximum matching way of described forward direction is for splitting described search word according to the first order, and described the first order is that the initial character of described search word is to the order of trailing character.
11. error correction devices according to claim 9, is characterized in that, described split cells also comprises:
Second splits module, for splitting described search word according to backward maximum matching way, obtain described a plurality of the first search word fragment, wherein, described backward maximum matching way is for splitting described search word according to the second order, and described the second order is that the trailing character of described search word is to the order of initial character.
12. error correction devices according to claim 9, is characterized in that, the quantity of described candidate result is a plurality of, and described determining unit comprises:
The first acquisition module, for obtaining the temperature of candidate result described in each, wherein, described temperature represents that described candidate result is confirmed as the degree of described target search word; And
The first determination module, for determining target search word described in the represented candidate result of the highest temperature.
13. error correction devices according to claim 9, is characterized in that, described concatenation unit comprises:
The second acquisition module, for obtaining the relative sequencing between described a plurality of the first search word fragment, wherein, described relative sequencing is for splitting the order producing to described search word; And
The first concatenation module, for according to described a plurality of the second search word fragments of described relative sequencing splicing, obtains described candidate result.
14. error correction devices according to claim 9, is characterized in that, described error correction device also comprises:
Set up unit, for setting up described linked database based on default resource, wherein, described default resource comprises the attribute of described search word.
15. error correction devices according to claim 14, is characterized in that, described default resource comprises the first default resource and the second default resource, and the described unit of setting up comprises:
The 3rd splits module, for splitting the character string of the described first default resource, obtains a plurality of first candidate's strings;
The 4th splits module, for splitting the character string of the described second default resource, obtains a plurality of second candidate's strings;
The first extraction module, for extracting first object candidate string from described a plurality of first candidate's strings, and extract the second target candidate string from described a plurality of second candidate's strings, wherein, described first object candidate string and described the second target candidate string are candidate's string with incidence relation; And
The first memory module, for being stored as the incidence relation between described first object candidate string, described the second target candidate string and described first object candidate string and described the second target candidate string the associated data of described linked database.
16. error correction devices according to claim 14, is characterized in that, described default resource comprises the first default resource and the second default resource, and the described unit of setting up comprises:
The 5th splits module, for splitting the character string of the described first default resource, obtains a plurality of first candidate's strings;
The 6th splits module, for splitting the character string of the described second default resource, obtains a plurality of second candidate's strings;
The second extraction module, for extracting first object candidate string from described a plurality of first candidate's strings, and extract the second target candidate string from described a plurality of second candidate's strings, wherein, described first object candidate string and described the second target candidate string are candidate's string with incidence relation;
The second concatenation module, for splicing described first object candidate string and described the second target candidate string; And
The second determination module, for the associated data of determining that the splicing result of described first object candidate string and described the second target candidate string is described linked database.
CN201410406835.7A 2014-08-18 2014-08-18 The error correction method and device of search term Active CN104156454B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410406835.7A CN104156454B (en) 2014-08-18 2014-08-18 The error correction method and device of search term

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410406835.7A CN104156454B (en) 2014-08-18 2014-08-18 The error correction method and device of search term

Publications (2)

Publication Number Publication Date
CN104156454A true CN104156454A (en) 2014-11-19
CN104156454B CN104156454B (en) 2018-09-18

Family

ID=51881952

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410406835.7A Active CN104156454B (en) 2014-08-18 2014-08-18 The error correction method and device of search term

Country Status (1)

Country Link
CN (1) CN104156454B (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106570180A (en) * 2016-11-10 2017-04-19 北京百度网讯科技有限公司 Artificial intelligence based voice searching method and device
CN106708893A (en) * 2015-11-17 2017-05-24 华为技术有限公司 Error correction method and device for search query term
CN106919614A (en) * 2015-12-28 2017-07-04 中国移动通信集团公司 A kind of information processing method and device
CN107193921A (en) * 2017-05-15 2017-09-22 中山大学 The method and system of the Sino-British mixing inquiry error correction of Search Engine-Oriented
CN109062903A (en) * 2018-08-22 2018-12-21 北京百度网讯科技有限公司 Method and apparatus for correcting wrong word
CN109800407A (en) * 2017-11-15 2019-05-24 腾讯科技(深圳)有限公司 Intension recognizing method, device, computer equipment and storage medium
CN110019295A (en) * 2017-09-25 2019-07-16 北京国双科技有限公司 Database index method, device, system and storage medium
CN110134936A (en) * 2018-02-08 2019-08-16 北京搜狗科技发展有限公司 A kind of segmenting method, device and electronic equipment
CN110569441A (en) * 2019-09-16 2019-12-13 腾讯科技(深圳)有限公司 error correction method and device for search character string
CN110795617A (en) * 2019-08-12 2020-02-14 腾讯科技(深圳)有限公司 Error correction method and related device for search terms
CN111382260A (en) * 2020-03-16 2020-07-07 腾讯音乐娱乐科技(深圳)有限公司 Method, device and storage medium for correcting retrieved text
CN111696545A (en) * 2019-03-15 2020-09-22 北京京东尚科信息技术有限公司 Speech recognition error correction method, device and storage medium
CN113535895A (en) * 2021-06-22 2021-10-22 北京三快在线科技有限公司 Search text processing method and device, electronic equipment and medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080221863A1 (en) * 2007-03-07 2008-09-11 International Business Machines Corporation Search-based word segmentation method and device for language without word boundary tag
CN103198149A (en) * 2013-04-23 2013-07-10 中国科学院计算技术研究所 Method and system for query error correction
CN103886094A (en) * 2014-04-03 2014-06-25 江苏物联网研究发展中心 Method for error correction and expansion of electronic commerce search engine
CN103914444A (en) * 2012-12-29 2014-07-09 高德软件有限公司 Error correction method and device thereof
CN103927330A (en) * 2014-03-19 2014-07-16 北京奇虎科技有限公司 Method and device for determining characters with similar forms in search engine

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080221863A1 (en) * 2007-03-07 2008-09-11 International Business Machines Corporation Search-based word segmentation method and device for language without word boundary tag
CN103914444A (en) * 2012-12-29 2014-07-09 高德软件有限公司 Error correction method and device thereof
CN103198149A (en) * 2013-04-23 2013-07-10 中国科学院计算技术研究所 Method and system for query error correction
CN103927330A (en) * 2014-03-19 2014-07-16 北京奇虎科技有限公司 Method and device for determining characters with similar forms in search engine
CN103886094A (en) * 2014-04-03 2014-06-25 江苏物联网研究发展中心 Method for error correction and expansion of electronic commerce search engine

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106708893A (en) * 2015-11-17 2017-05-24 华为技术有限公司 Error correction method and device for search query term
WO2017084506A1 (en) * 2015-11-17 2017-05-26 华为技术有限公司 Method and device for correcting search query term
CN106708893B (en) * 2015-11-17 2018-09-28 华为技术有限公司 Search query word error correction method and device
CN106919614A (en) * 2015-12-28 2017-07-04 中国移动通信集团公司 A kind of information processing method and device
CN106570180B (en) * 2016-11-10 2020-05-22 北京百度网讯科技有限公司 Voice search method and device based on artificial intelligence
CN106570180A (en) * 2016-11-10 2017-04-19 北京百度网讯科技有限公司 Artificial intelligence based voice searching method and device
CN107193921A (en) * 2017-05-15 2017-09-22 中山大学 The method and system of the Sino-British mixing inquiry error correction of Search Engine-Oriented
CN107193921B (en) * 2017-05-15 2020-02-07 中山大学 Method and system for correcting error of Chinese-English mixed query facing search engine
CN110019295B (en) * 2017-09-25 2021-07-27 北京国双科技有限公司 Database retrieval method, device, system and storage medium
CN110019295A (en) * 2017-09-25 2019-07-16 北京国双科技有限公司 Database index method, device, system and storage medium
CN109800407A (en) * 2017-11-15 2019-05-24 腾讯科技(深圳)有限公司 Intension recognizing method, device, computer equipment and storage medium
CN109800407B (en) * 2017-11-15 2021-11-16 腾讯科技(深圳)有限公司 Intention recognition method and device, computer equipment and storage medium
CN110134936A (en) * 2018-02-08 2019-08-16 北京搜狗科技发展有限公司 A kind of segmenting method, device and electronic equipment
CN109062903A (en) * 2018-08-22 2018-12-21 北京百度网讯科技有限公司 Method and apparatus for correcting wrong word
CN111696545A (en) * 2019-03-15 2020-09-22 北京京东尚科信息技术有限公司 Speech recognition error correction method, device and storage medium
CN111696545B (en) * 2019-03-15 2023-11-03 北京汇钧科技有限公司 Speech recognition error correction method, device and storage medium
CN110795617A (en) * 2019-08-12 2020-02-14 腾讯科技(深圳)有限公司 Error correction method and related device for search terms
CN110569441A (en) * 2019-09-16 2019-12-13 腾讯科技(深圳)有限公司 error correction method and device for search character string
CN111382260A (en) * 2020-03-16 2020-07-07 腾讯音乐娱乐科技(深圳)有限公司 Method, device and storage medium for correcting retrieved text
CN113535895A (en) * 2021-06-22 2021-10-22 北京三快在线科技有限公司 Search text processing method and device, electronic equipment and medium

Also Published As

Publication number Publication date
CN104156454B (en) 2018-09-18

Similar Documents

Publication Publication Date Title
CN104156454A (en) Search term correcting method and device
CN106649818B (en) Application search intention identification method and device, application search method and server
US10795939B2 (en) Query method and apparatus
CN107992585B (en) Universal label mining method, device, server and medium
CN111737499B (en) Data searching method based on natural language processing and related equipment
US20170351687A1 (en) Method and system for enhanced query term suggestion
CN104133877B (en) The generation method and device of software label
CN108959559B (en) Question and answer pair generation method and device
CN111831802B (en) Urban domain knowledge detection system and method based on LDA topic model
CN112989055B (en) Text recognition method and device, computer equipment and storage medium
CN103885608A (en) Input method and system
CN108304424B (en) Text keyword extraction method and text keyword extraction device
CN110147494B (en) Information searching method and device, storage medium and electronic equipment
CN108304377B (en) Extraction method of long-tail words and related device
CN109933708A (en) Information retrieval method, device, storage medium and computer equipment
CN112395867A (en) Synonym mining method, synonym mining device, synonym mining storage medium and computer equipment
CN111198936B (en) Voice search method and device, electronic equipment and storage medium
CN111078858A (en) Article searching method and device and electronic equipment
CN110245357B (en) Main entity identification method and device
CN111125305A (en) Hot topic determination method and device, storage medium and electronic equipment
CN111078849A (en) Method and apparatus for outputting information
CN110991169A (en) Method and device for identifying risk content variety and electronic equipment
US9946765B2 (en) Building a domain knowledge and term identity using crowd sourcing
CN111488510A (en) Method and device for determining related words of small program, processing equipment and search system
US20230066149A1 (en) Method and system for data mining

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant