CN103530380B - A kind of vertical search device and method - Google Patents

A kind of vertical search device and method Download PDF

Info

Publication number
CN103530380B
CN103530380B CN201310487578.XA CN201310487578A CN103530380B CN 103530380 B CN103530380 B CN 103530380B CN 201310487578 A CN201310487578 A CN 201310487578A CN 103530380 B CN103530380 B CN 103530380B
Authority
CN
China
Prior art keywords
chinese
phonetic alphabet
keyword
chinese phonetic
search
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201310487578.XA
Other languages
Chinese (zh)
Other versions
CN103530380A (en
Inventor
耿祥磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Qihoo Technology Co Ltd
Original Assignee
Beijing Qihoo Technology Co Ltd
Qizhi Software Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Qihoo Technology Co Ltd, Qizhi Software Beijing Co Ltd filed Critical Beijing Qihoo Technology Co Ltd
Priority to CN201310487578.XA priority Critical patent/CN103530380B/en
Publication of CN103530380A publication Critical patent/CN103530380A/en
Application granted granted Critical
Publication of CN103530380B publication Critical patent/CN103530380B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/232Orthographic correction, e.g. spell checking or vowelisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/53Processing of non-Latin text

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Document Processing Apparatus (AREA)

Abstract

The invention discloses a kind of vertical search device and method, method for vertical search therein includes:Obtaining at least includes Chinese written language and/or the Chinese phonetic alphabet in keyword, keyword;Chinese written language in keyword is converted to the Chinese phonetic alphabet of correlation;The search result matched is searched in vertical search information bank according to the corresponding Chinese phonetic alphabet of keyword, the vertical search information bank at least includes the index entry and the corresponding search result of each index entry of Chinese phonetic alphabet form.Pass through vertical search device and method provided in an embodiment of the present invention, it is possible to increase the degree of accuracy of vertical search.

Description

A kind of vertical search device and method
Technical field
The present invention relates to vertical search technical field, and in particular to one kind is used for vertical search equipment, and a kind of vertical Searching method.
Background technology
Vertical search engine is the professional search engine for some industry, is the subdivision and extension of search engine, is The information special to certain class in web page library is once integrated, orientation point field extract needs data handled after again User is returned to some form.
When user carries out vertical search, the keyword of input is probably Chinese written language, it is also possible to the Chinese phonetic alphabet.
If the keyword of user's input is Chinese written language, way general at present is:Directly according to Chinese written language in letter Breath is scanned in storehouse, but if the keyword of user's input is wrong word, then often search for less than the desired knot of user Really.Such as user originally should in game vertical search engine input search keyword " seven male ", and then search that " seven heros are striven This game result of despot ", but because of various reasons, what user may input is wrong word word " pneumothorax ", is pressed in this case " seven male contentions for hegemony " desired game result of this user can not be just searched according to way general at present.
If the keyword of user's input is the Chinese phonetic alphabet, way general at present is:The Chinese phonetic alphabet that user is inputted The interface of some general phonetic search is submitted to, corresponding Chinese-character words are converted into by the interface(It is one or more), then The Chinese-character words of return scan for as search keyword and provide result.It was found by the inventors of the present invention that due to Chinese character with And the phonetically similar word of word, homonym are relatively more, such user is after input Pinyin, by the conversion of general phonetic searching interface The unisonance word come also compares many, the search of these words and current vertical be related to field content may not necessarily match completely or With inaccurate, and then cause when being scanned for these keywords, result may not be that user is desired.Such as, user When field of play carries out vertical search, phonetic " qixiong " is have input, submitting to may return after general phonetic searching interface " pneumothorax ", the word such as " neat chest ", and user's expectation can not be searched in the vertical search of field of play using these keywords Result " seven male contention for hegemony ".
The content of the invention
There is provided a kind of vertical search for overcoming above mentioned problem or solving the above problems at least in part in view of the above problems Equipment and corresponding method for vertical search.
According to one embodiment of the invention there is provided a kind of vertical search equipment, including:Interactive interface, is configured as obtaining Taking at least includes Chinese written language and/or the Chinese phonetic alphabet in keyword, the keyword;Converter, is configured as via described Chinese written language in the keyword that interactive interface is got is converted to the Chinese phonetic alphabet of correlation;And searcher, it is configured as root The search result matched, the vertical search information are searched in vertical search information bank according to the corresponding Chinese phonetic alphabet of keyword Storehouse at least includes the index entry and the corresponding search result of each index entry of Chinese phonetic alphabet form.
Optionally, converter is additionally configured to will act as the Chinese written language of corresponding data index information and is converted to the Chinese of correlation Language phonetic, the search equipment also includes:Constructor is indexed, is configured as the above-mentioned Chinese phonetic alphabet being added to the vertical search Information bank as corresponding data index entry.
Optionally, the converter includes:Direct modular converter, is configured as according between Chinese written language and the Chinese phonetic alphabet Corresponding relation, Chinese written language is converted to some related Chinese phonetic alphabet by and preset participle mode.
Optionally, the converter also includes:Fuzzy phoneme modular converter, is configured as according between the different Chinese phonetic alphabet Fuzzy phoneme corresponding relation, obtains other Chinese for having fuzzy phoneme corresponding relation with the Chinese phonetic alphabet that the direct modular converter is provided Language phonetic, other described Chinese phonetic alphabet also serve as the related Chinese phonetic alphabet of the Chinese written language.
Optionally, the searcher is additionally configured to being converted to the Chinese written language in keyword into the Chinese phonetic alphabet of correlation Before, the search result matched is searched in vertical search information bank according to the Chinese written language, if searched, directly Return to search result.
According to a further embodiment of the invention, a kind of method for vertical search is additionally provided, including:Keyword is obtained, it is described At least include Chinese written language and/or the Chinese phonetic alphabet in keyword;Chinese written language in the keyword is converted to the Chinese of correlation Language phonetic;The search result matched is searched in vertical search information bank according to the corresponding Chinese phonetic alphabet of keyword, it is described to hang down Straight search information bank at least includes the index entry and the corresponding search result of each index entry of Chinese phonetic alphabet form.
Optionally, in addition to:The Chinese written language that will act as corresponding data index information is converted to the Chinese phonetic alphabet of correlation;With And the index entry using the Chinese phonetic alphabet added to the vertical search information bank as corresponding data.
Optionally, it is described by Chinese written language be converted to correlation the Chinese phonetic alphabet the step of include:According to Chinese written language and the Chinese Corresponding relation between language phonetic, and preset participle, permutation and combination method, some related Chinese are converted to by Chinese written language Language phonetic.
Optionally, it is described by Chinese written language be converted to correlation the Chinese phonetic alphabet the step of also include:Spelled according to different Chinese Fuzzy phoneme corresponding relation between sound, obtains other Chinese phonetic alphabet for having fuzzy phoneme corresponding relation with the above-mentioned Chinese phonetic alphabet, institute State other Chinese phonetic alphabet and also serve as the related Chinese phonetic alphabet of the Chinese written language.
Optionally, before the Chinese written language in the keyword to be converted to the Chinese phonetic alphabet step of correlation, in addition to: The search result matched is searched for according to the Chinese written language in the vertical search information bank, if searched, directly Return to search result.
The method for vertical search and equipment provided according to embodiments of the present invention, due to the Chinese written language in keyword being changed For the Chinese phonetic alphabet, and it is the index entry that each search result adds Chinese phonetic alphabet form in vertical search information bank, so that Obtaining can scan for according to the Chinese phonetic alphabet in vertical search information bank;And/or, the Chinese phonetic alphabet directly in keyword Scan for, and if then make it that the keyword of user's input is wrong, such as will " seven male " mistake it is defeated into " pneumothorax ", can also search Rope is to correct search result " seven male contention for hegemony ";Or if user's input " qixiong ", can also search the result striven for " seven male contentions for hegemony ", can not be searched for without words such as " pneumothoraxs ", " neat chest " because being returned using general phonetic searching interface The problem of to correct result.
Described above is only the general introduction of technical solution of the present invention, in order to better understand the technological means of the present invention, And can be practiced according to the content of specification, and in order to allow above and other objects of the present invention, feature and advantage can Become apparent, below especially exemplified by the embodiment of the present invention.
Brief description of the drawings
By reading the detailed description of hereafter preferred embodiment, various other advantages and benefit is common for this area Technical staff will be clear understanding.Accompanying drawing is only used for showing the purpose of preferred embodiment, and is not considered as to the present invention Limitation.And in whole accompanying drawing, identical part is denoted by the same reference numerals.In the accompanying drawings:
Fig. 1 shows search equipment schematic diagram according to an embodiment of the invention;
Fig. 2 shows searching method flow chart according to an embodiment of the invention.
Embodiment
The exemplary embodiment of the disclosure is more fully described below with reference to accompanying drawings.Although showing the disclosure in accompanying drawing Exemplary embodiment, it being understood, however, that may be realized in various forms the disclosure without should be by embodiments set forth here Limited.On the contrary, these embodiments are provided to facilitate a more thoroughly understanding of the present invention, and can be by the scope of the present disclosure Complete conveys to those skilled in the art.
Referring to Fig. 1, it is the search equipment schematic diagram according to one embodiment of the invention.The search equipment can include Interactive interface 102, converter 104, searcher 106, show interface 108, index constructor 110 and vertical search information bank 112。
Interactive interface 102 is to search between equipment and user to be used for the interface for carrying out information exchange, such as obtains user defeated The keyword entered.General each vertical search has oneself corresponding interactive interface 102, every pass inputted in the interactive interface Keyword, that is, be considered as to need to carry out information search in corresponding vertical field.Common vertical search field have field of play, Air ticket field, shopping area etc., after user inputs some keywords in the interactive interface 102 of field of play, subsequently will Corresponding result is searched for for user in the information bank in this vertical field of playing, and then user inputs in the interactive interface 102 Keyword, vertical search keyword can be referred to as.
The keyword species for user's input that interactive interface 102 is received is a lot, may include Chinese written language form, than The Chinese written language included in such as " pneumothorax ", " star ", keyword is probably a Chinese character, it is also possible to which two or more Chinese character is constituted Word, the embodiment of the present invention is collectively termed as Chinese written language;May also be comprising Chinese phonetic alphabet form, such as " qixiong ", " xingji " etc., it is also possible to have other forms.In embodiments of the present invention, spelled primarily directed to Chinese written language form or Chinese The keyword of sound form carries out innovative processing.Introduced respectively below for both of these case.
The first, if the vertical search keyword that interactive interface 102 is received includes Chinese written language, then follow-up place Reason is as follows:
First, the Chinese written language in keyword is supplied to converter 104 by interactive interface 102.Converter 104 is by the Chinese Text conversion is the related Chinese phonetic alphabet.In transfer process, there are a variety of specific implementations.Such as, converter 104 can include Direct modular converter 1042, optionally, can also include fuzzy phoneme modular converter 1044.
Specifically, direct modular converter 1042 is according to the corresponding relation between Chinese written language and the Chinese phonetic alphabet, and in advance Participle, the permutation and combination method put, some related Chinese phonetic alphabet are converted to by Chinese written language.Chinese written language and the Chinese phonetic alphabet it Between be that such as Chinese written language " seven is male " corresponding Chinese phonetic alphabet is exactly " qixiong ", Chinese language in the presence of fixed corresponding relation The corresponding Chinese phonetic alphabet of word " gas " is exactly " qi " etc., therefore, it can directly correspond to according to Chinese written language and the Chinese phonetic alphabet and closes System, is converted to the corresponding Chinese phonetic alphabet by the Chinese written language in keyword, such as is converted to keyword " pneumothorax contention for hegemony " “qixiongzhengba”。
It should be noted that if the keyword of user's input is multiple Chinese characters, then in order to more accurately scan for, Chinese written language in keyword can also be converted to by multiple related Chinese phonetic alphabet according to preset participle mode.The side of participle Formula can have many kinds, still by taking the keyword of foregoing " pneumothorax contention for hegemony " as an example, such as can once be divided with each Chinese character Word, such as cutting are " qi " " xiong " " Zheng " " ba ";Can also each two Chinese character carry out a participle, such as cutting is “qixiong”“zhengba”;Can also laddering participle from front to back, such as cutting is " qi " " qixiong " “qixiongzheng”“qixiongzhengba”;Can also laddering participle from back to front in turn, such as cutting is " ba " “zhengba”“xiongzhengba”“qixiongzhegnba”.If in addition, it is also desirable that obtaining more preferable, more full search knot Really, the word after cutting can also be carried out to various permutation and combination, such as permutation and combination goes out " qizheng " " qiba " " xiongba " again Etc..The multiple Chinese phonetic alphabet formed after above-mentioned participle or permutation and combination, can be the Chinese written languages directly to " pneumothorax contention for hegemony " Participle and/or permutation and combination are first carried out, each Chinese written language is then converted into the corresponding Chinese phonetic alphabet again;Can also be first by " gas Chest is contended for hegemony " " qixiongzhengba " is converted to, participle and/or permutation and combination then are carried out to " qixiongzhengba " again.
Except it is listed above go out participle, in addition to permutation and combination method, also have many kinds, it is impossible to limit.Either include , or it is unrequited go out various participles, permutation and combination method can be used alone, can also arbitrarily various ways combine one Rise, the keyword for the Chinese written language form that user is inputted is converted to multiple related Chinese phonetic alphabet.In a word, participle mode, row Row combination, can according to the actual requirements, search equipment operational capability depending on, the embodiment of the present invention is not limited to this System.
The correlation technique for being described above direct modular converter 1042 in converter 104 is realized.Optionally, converter 104 in addition to including direct modular converter 1042, in order to further improve search efficiency and the degree of accuracy, can also include fuzzy Sound modular converter 1044.
Specifically, fuzzy phoneme modular converter 1044 is obtained according to the fuzzy phoneme corresponding relation between the different Chinese phonetic alphabet The Chinese phonetic alphabet provided with direct modular converter 1042 has other Chinese phonetic alphabet of fuzzy phoneme corresponding relation, is turned by fuzzy phoneme Other Chinese phonetic phonetics that block 1044 is obtained are changed the mold, similarly as the corresponding related Chinese phonetic alphabet of Chinese written language in keyword. The generation of fuzzy phoneme be based on many reasons, such as the reason for being accustomed in certain areas based on dialect, some people be hard to tell pre-nasal sound, Nasal sound afterwards, such as " in " and " ing " regardless of, " an " and " ang " regardless of;And for example, somebody is bad based on the grasp to phonetic, easily " z " and " zh ", " s " and " sh ", " r " and " l ", " l " and " n " etc. is obscured;Also other a variety of causes, but essence is all Identical, is that possible be obscured two kinds of different phonetics.For this situation, the correspondence pass of fuzzy phoneme can be obtained System, such as by " in "=" ing ", " an "=" ang ", " z "=" zh ", " r "=" l ".
Assuming that certain user is original will to search for a game for " space craft " originally, but it is due to that he is hard to tell " xin " " xing ", therefore the Chinese phonetic alphabet knocked in when being typewrited with Chinese phonetic alphabet input method is " xinjizhengba ", and then input Keyword be Chinese written language " new border contention for hegemony ", in this case, pass through direct modular converter 1042 obtain related Chinese spell Cent is not " xin " " ji " " Zheng " " ba " " xiji " " xinjizheng " " xinjizhengba " " zhengba ".If turned Also include fuzzy phoneme modular converter 1044 in parallel operation 104, then fuzzy phoneme modular converter 1044 can be to direct modular converter Each Chinese phonetic alphabet of 1042 outputs, other related Chinese audios, such as basis are derived according to the corresponding relation of fuzzy phoneme again Corresponding relation between " in " and " ing ", " xing " is derived by " xin ", similarly, and " xiji " derives " xingji ", " xinjizheng " is derived " xingjizheng " etc..Then, the keyword " new border contention for hegemony " of user's input, passes through converter The conversion of direct modular converter 1042 and fuzzy phoneme modular converter 1044, can obtain multiple related Chinese phonetic alphabet in 104: “xin”“ji”“Zheng”“ba”“xiji”“xinjizheng”“xinjizhengba”“zhengba”“xing”“xingji” " xinjizheng " " xinjizhengba " etc..As can be seen here, fuzzy phoneme modular converter 1044 can expand Chinese in keyword Text conversion into the Chinese phonetic alphabet quantity, increase subsequent searches scope, reduce to a certain extent user because mispronunciation, The keyword of input error, the situation for leading to not search desired result occurs.
After converter 104 is handled the keyword of Chinese written language form, submit to searcher 106 and handled. It has previously been mentioned that the keyword that interactive interface 102 is received is probably Chinese written language, it is also possible to the Chinese phonetic alphabet, above first It is data handling procedure in the case of Chinese written language to be described in kind of situation, and it is the Chinese phonetic alphabet to be introduced below in second of situation In the case of data handling procedure.
Second, if the vertical search word that interactive interface 102 is received includes the Chinese phonetic alphabet, then subsequent treatment is such as Under:If what interactive interface 102 was received has been the keyword of Chinese phonetic alphabet form, then interactive interface 102 is directly by the Chinese Language phonetic is supplied to searcher 106, without carrying out data conversion by converter 104.
It in summary it can be seen, no matter in the case of the first that keyword is Chinese written language, or in keyword be Chinese In the case of second of phonetic, the Chinese phonetic alphabet that correlation finally can be all provided to searcher 106 carries out result search.
After the Chinese phonetic alphabet that searcher 106 obtains correlation from converter 104 or interactive interface 102, spelled according to Chinese Sound searches for the search result matched in vertical search information bank.Specifically, vertical search information bank at least includes Chinese The index entry of PINYIN form and the corresponding search result of each index entry., will also be by when constructing vertical search information bank Converter 104 and index constructor 110.
First, search equipment can collect by all means can as search result various data, such as webpage Data, document data etc..In order to quickly navigate to corresponding data, certain index typically can be all set for these data Information, such as can be title, the label of corresponding data(Such as lie fallow, stimulate), title, brief introduction, summary etc., as long as can be with For identifying some data, the index information that can serve as corresponding data with certain mark action, these index informations, Point to corresponding data.And then, according to index information, it is possible to find corresponding data, i.e. search result.This index also leads to Frequently referred to inverted index.
Because most of data are all Chinese written language forms, therefore it is Chinese written language that corresponding index information is also mostly, this , it is necessary to which by converter 104 by the Chinese written language in various data indexing informations, the Chinese for being converted to correlation is spelled in the case of kind Sound.Duplicated to the Chinese written language in keyword is converted into the related Chinese phonetic alphabet previously by converter 104, it is vertical building Similar data handling procedure is also taken by the Chinese written language in the index information for the various data being collected into during search information bank, Be converted to the Chinese phonetic alphabet of correlation.Specifically, direct modular converter 1042 is according to pair between Chinese written language and the Chinese phonetic alphabet It should be related to, and preset participle, permutation and combination method, Chinese written language is converted into some related Chinese phonetic alphabet, it is specific to turn Process is changed, with reference to the description previously for direct modular converter 1042, here is omitted.
For example, when certain game data to be introduced to the vertical search information bank of search engine, first the index of the game The Chinese character of information, such as game name is converted to phonetic, then carries out participle and/or permutation and combination to these phonetics again, is used in combination Space etc. point lexicon link(Letter is typically given tacit consent to according to space participle in a search engine), form one and spelled comprising some Chinese The index field of sound, the index field includes some index entries.Then, index constructor 110 spells the Chinese in index field Sound is added to vertical search information bank as the index entry of corresponding data.
Such as, there is the game of a " seven male contentions for hegemony ", it is assumed that the title " seven male contentions for hegemony " of the game is index information, the index Information obtains some related Chinese phonetic alphabet, the content of such as index field includes after the conversion of direct modular converter 1042: “qi xiong Zheng ba qixiong qixiongzheng qixiongzhengba zhengba”.It can be seen that, this game Index entry at least include eight, this eight index entries all point to " seven male contention for hegemony " this game.If searcher 106 is according to upper Any one phonetic stated in 8 index entries is scanned for, and can accurately search " seven male contentions for hegemony " this game.
And then, that no matter user inputted in interactive interface 102 is " seven male " or " pneumothorax ", by the place of converter 104 Reason can be converted to the keyword of the Chinese phonetic alphabet form of correlation, such as " qi " " xiong " " qixiong ".And then, searcher 106 can use the keyword " qi " or " xiong " or " qixiong " of Chinese phonetic alphabet form to scan for, and due to vertically searching The index entry of " seven male contentions for hegemony " this game includes " qi " " xiong " " qixiong " etc., therefore, searcher 106 in rope information bank The document data of " seven male contentions for hegemony " this game can accurately be searched.As can be seen here, the technical side of the embodiment of the present invention is passed through Case, even if user have input wrong word, as long as phonetic is correct(Without tone), such as should input " seven is male " but erroneous input " pneumothorax ", also can accurately be searched " seven male contentions for hegemony " desired by user.
Further, in order to expand vertical search information bank index information it is comprehensive, before can also equally introducing The fuzzy phoneme modular converter 1044 in face, handling principle is similar with the previously described fuzzy phoneme processing to keyword, only mistake Journey in turn, i.e., according to the fuzzy phoneme corresponding relation between the different Chinese phonetic alphabet, is obtained after being changed with direct modular converter 1042 The corresponding fuzzy phoneme of the index information Chinese phonetic alphabet, the corresponding Chinese of index information that direct modular converter 1042 is obtained spells Sound, and the corresponding Chinese phonetic alphabet of index information that fuzzy phoneme modular converter 1044 is obtained, all typing vertical search information bank, altogether With the index information as a certain search result.
By the above, due in vertical search information bank, including many search results, and each search for As a result the index entry of Chinese phonetic alphabet form is all included, therefore searcher 106 is according to the corresponding Chinese phonetic alphabet of keyword, can hang down The search result matched is searched in straight search information bank.
Optionally, in order to improve efficiency, before the Chinese written language in keyword to be converted to the Chinese phonetic alphabet of correlation, search Rope device 106 can first search for the search result matched according to Chinese written language in vertical search information bank, if searched, Directly return to search result, it is not necessary to be reconverted into the Chinese phonetic alphabet and scan for.Certainly, if for search more comprehensively, also may be used To be carried out in the lump according to Chinese written language search with being searched for according to the Chinese phonetic alphabet.In addition, it is some in particular cases, be also not excluded for only The possibility searched for according to the Chinese phonetic alphabet.
Optionally, if the keyword inherently Chinese phonetic alphabet, without the conversion by converter 104, searcher 106 Directly scanned for according to the keyword of Chinese phonetic alphabet form.
Optionally, if the existing Chinese written language of keyword of user's input, has the Chinese phonetic alphabet, then for Chinese written language again Part can be converted to the Chinese phonetic alphabet using converter 104, after the conversion that then searcher 106 is provided according to converter 104 The Chinese phonetic alphabet and the Chinese phonetic alphabet keyword of user's input, are scanned in vertical search information bank in the lump.
It is corresponding with above search equipment, the embodiment of the invention also discloses a kind of searching method, specifically include following several Individual step:
Step S210:Obtaining at least includes Chinese written language and/or the Chinese phonetic alphabet in keyword, keyword.It is appreciated that with The keyword for being desired with vertical search of family input can be Chinese written language, or the simply Chinese phonetic alphabet or Chinese Word and the Chinese phonetic alphabet are inputted in the lump.This step can specifically be realized by interactive interface 102 hereinbefore, related technology Feature refers to the corresponding description of interactive interface 102 above, and here is omitted.
Step S220:Chinese written language in keyword is converted to the Chinese phonetic alphabet of correlation.If in step S210 only The Chinese phonetic alphabet is have input, Chinese written language is not inputted, then need not handle this step.Only when there is the Chinese in the keyword of input During Chinese language word, just need to perform this step.This step can specifically be realized by converter 104 hereinbefore, for example, according to Corresponding relation between Chinese written language and the Chinese phonetic alphabet, and preset participle, permutation and combination method, Chinese written language is converted to Some related Chinese phonetic alphabet.Or further, according to the fuzzy phoneme corresponding relation between the different Chinese phonetic alphabet, obtain with before Stating the Chinese phonetic alphabet has other Chinese phonetic alphabet of fuzzy phoneme corresponding relation, and it is related that other Chinese phonetic alphabet also serve as the Chinese written language The Chinese phonetic alphabet.In other words, the Chinese written language in keyword is directly converted into the corresponding Chinese phonetic alphabet, or further By the corresponding relation of fuzzy phoneme, more Chinese phonetic alphabet are changed out.The phase of converter 104 before related technical characteristic is referred to It should describe, here is omitted.
Step S230:The search knot matched is searched in vertical search information bank according to the corresponding Chinese phonetic alphabet of keyword Really, vertical search information bank at least includes the index entry and the corresponding search result of each index entry of Chinese phonetic alphabet form.Tool For body, the Chinese written language that can will act as corresponding data index information is converted to the Chinese phonetic alphabet of correlation, this transfer process Chinese phonetic alphabet principle is identical with being converted into keyword in step S220, and simply process is on the contrary, so repeat no more.Then, then Index entry using the Chinese phonetic alphabet added to vertical search information bank as corresponding data.This step can pass through search hereinbefore Device 106, index constructor 110, vertical search information bank 112 realize that related technical characteristic refers to these modules Corresponding description, here is omitted.
Optionally, perform step S220 by the Chinese written language in keyword be converted to correlation Chinese phonetic alphabet step it Before, the search result matched can also be first searched for according to the Chinese written language in vertical search information bank, if searched, Then directly return to search result.If search is less than then performing step S220 and S230, and then return to search result.
In summary, by method for vertical search provided in an embodiment of the present invention and equipment, due to by the Chinese in keyword Chinese language word is converted to the Chinese phonetic alphabet, and is the index that each search result adds Chinese phonetic alphabet form in vertical search information bank , so that being scanned for according to the Chinese phonetic alphabet in vertical search information bank;And/or, directly according in keyword The Chinese phonetic alphabet scan for, and if then make it that the keyword of user's input is wrong, such as will " seven male " mistake it is defeated into " gas Chest ", can also search correct search result " seven male contentions for hegemony ";Or if user's input " qixiong ", can also be searched The result " seven male contentions for hegemony " striven for, without words such as " pneumothoraxs ", " neat chest " because being returned using general phonetic searching interface The problem of language can not search correct result.
Further, during the Chinese written language in keyword is converted to the Chinese phonetic alphabet, it can not only use and directly turn Change, it is also contemplated that fuzzy phoneme, similarly, be also contemplated for when vertical search information bank builds the Chinese phonetic alphabet of document data index entry Fuzzy phoneme, can further increase the comprehensive of search, so as to improve the accuracy rate of search result.
Further, first directly it can be scanned for using Chinese written language, search is corresponding less than being used again after Suitable results The Chinese phonetic alphabet search, so as to further improve search efficiency.
Algorithm and display be not inherently related to any certain computer, virtual system or miscellaneous equipment provided herein. Various general-purpose systems can also be used together with based on teaching in this.As described above, construct required by this kind of system Structure be obvious.In addition, the present invention is not also directed to any certain programmed language.It is understood that, it is possible to use it is various Programming language realizes the content of invention described herein, and the description done above to language-specific is to disclose this hair Bright preferred forms.
In the specification that this place is provided, numerous specific details are set forth.It is to be appreciated, however, that the implementation of the present invention Example can be put into practice in the case of these no details.In some instances, known method, structure is not been shown in detail And technology, so as not to obscure the understanding of this description.
Similarly, it will be appreciated that in order to simplify the disclosure and help to understand one or more of each inventive aspect, exist Above in the description of the exemplary embodiment of the present invention, each feature of the invention is grouped together into single implementation sometimes In example, figure or descriptions thereof.However, the method for the disclosure should be construed to reflect following intention:It is i.e. required to protect The application claims of shield features more more than the feature being expressly recited in each claim.More precisely, such as following Claims reflect as, inventive aspect is all features less than single embodiment disclosed above.Therefore, Thus the claims for following embodiment are expressly incorporated in the embodiment, wherein each claim is in itself All as the separate embodiments of the present invention.
Those skilled in the art, which are appreciated that, to be carried out adaptively to the module in the equipment in embodiment Change and they are arranged in one or more equipment different from the embodiment.Can be the module or list in embodiment Member or component be combined into a module or unit or component, and can be divided into addition multiple submodule or subelement or Sub-component.In addition at least some in such feature and/or process or unit exclude each other, it can use any Combination is to this specification(Including adjoint claim, summary and accompanying drawing)Disclosed in all features and so disclosed appoint Where all processes or unit of method or equipment are combined.Unless expressly stated otherwise, this specification(Including adjoint power Profit requires, made a summary and accompanying drawing)Disclosed in each feature can be by providing the alternative features of identical, equivalent or similar purpose come generation Replace.
Although in addition, it will be appreciated by those of skill in the art that some embodiments described herein include other embodiments In included some features rather than further feature, but the combination of the feature of be the same as Example does not mean in of the invention Within the scope of and form different embodiments.For example, in the following claims, times of embodiment claimed One of meaning mode can be used in any combination.
The present invention all parts embodiment can be realized with hardware, or with one or more processor run Software module realize, or realized with combinations thereof.It will be understood by those of skill in the art that can use in practice Microprocessor or digital signal processor(DSP)Come realize in search equipment according to embodiments of the present invention some or it is complete The some or all functions of portion's part.The present invention be also implemented as a part for performing method as described herein or Person whole equipment or program of device(For example, computer program and computer program product).It is such to realize the present invention's Program can be stored on a computer-readable medium, or can have the form of one or more signal.Such signal It can download and obtain from internet website, either provide or provided in any other form on carrier signal.
It should be noted that the present invention will be described rather than limits the invention for above-described embodiment, and ability Field technique personnel can design alternative embodiment without departing from the scope of the appended claims.In the claims, Any reference symbol between bracket should not be configured to limitations on claims.Word "comprising" is not excluded the presence of not Element or step listed in the claims.Word "a" or "an" before element does not exclude the presence of multiple such Element.The present invention can be by means of including the hardware of some different elements and coming real by means of properly programmed computer It is existing.In if the unit claim of equipment for drying is listed, several in these devices can be by same hardware branch To embody.The use of word first, second, and third does not indicate that any order.These words can be explained and run after fame Claim.

Claims (6)

1. a kind of vertical search equipment, including:
Interactive interface, corresponding with vertical search equipment, being configured as obtaining at least includes Chinese language in keyword, the keyword Word and the Chinese phonetic alphabet, the keyword are that the vertical search for carrying out information search in the corresponding vertical field of vertical search equipment is closed Keyword;
Converter, the Chinese written language being configured as in the keyword that will be got via the interactive interface is converted to the Chinese of correlation Language phonetic;The Chinese written language that the converter is additionally configured to will act as corresponding data index information is converted to related Chinese spelling Sound;
The converter includes:Fuzzy phoneme modular converter, is configured as being closed according to the fuzzy phoneme correspondence between the different Chinese phonetic alphabet System, obtains other Chinese phonetic alphabet for having fuzzy phoneme corresponding relation to the related Chinese phonetic alphabet, other described Chinese phonetic alphabet Also serve as the Chinese phonetic alphabet related to the Chinese written language;
Searcher, the Chinese phonetic alphabet that the Chinese phonetic alphabet or the keyword include for being configured as being produced according to the conversion exists The search result matched is searched in vertical search information bank, the vertical search information bank at least includes Chinese phonetic alphabet form Index entry and the corresponding search result of each index entry;
Constructor is indexed, the rope added to the vertical search information bank as corresponding data using the above-mentioned Chinese phonetic alphabet is configured as Draw item.
2. searching for equipment as claimed in claim 1, the converter includes:
Direct modular converter, is configured as according to the corresponding relation between Chinese written language and the Chinese phonetic alphabet, and preset participle Mode, some related Chinese phonetic alphabet are converted to by Chinese written language.
3. search equipment as claimed in claim 1 or 2, the searcher is additionally configured to by the Chinese written language in keyword Before the Chinese phonetic alphabet for being converted to correlation, the search knot matched is searched in vertical search information bank according to the Chinese written language Really, if searched, search result is directly returned.
4. a kind of method for vertical search, including:
Interactive interface corresponding with vertical search equipment, which is obtained, at least includes Chinese written language and Chinese in keyword, the keyword Phonetic, the keyword is to carry out the vertical search keyword of information search in the corresponding vertical field of vertical search equipment;
Chinese written language in the keyword got via the interactive interface is converted to the Chinese phonetic alphabet of correlation, and The Chinese written language that will act as corresponding data index information is converted to the Chinese phonetic alphabet of correlation;
According to the fuzzy phoneme corresponding relation between the different Chinese phonetic alphabet, obtain that there is fuzzy phoneme pair to the related Chinese phonetic alphabet Other Chinese phonetic alphabet that should be related to, other described Chinese phonetic alphabet also serve as the Chinese phonetic alphabet related to the Chinese written language;
Using the Chinese phonetic alphabet added to the vertical search information bank as corresponding data index entry, according to the Chinese The Chinese phonetic alphabet that the related Chinese phonetic alphabet of word or the keyword include is searched in vertical search information bank to match Search result, the vertical search information bank at least including Chinese phonetic alphabet form index entry and each index entry it is corresponding Search result.
5. searching method as claimed in claim 4, it is described Chinese written language is converted to correlation the Chinese phonetic alphabet the step of include:
According to the corresponding relation between Chinese written language and the Chinese phonetic alphabet, and preset participle, permutation and combination method, by Chinese language Word is converted to some related Chinese phonetic alphabet.
6. the searching method as described in claim 4 or 5, the Chinese written language in the keyword is being converted to the Chinese of correlation Before phonetic step, in addition to:The search knot matched is searched for according to the Chinese written language in the vertical search information bank Really, if searched, search result is directly returned.
CN201310487578.XA 2013-10-17 2013-10-17 A kind of vertical search device and method Active CN103530380B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310487578.XA CN103530380B (en) 2013-10-17 2013-10-17 A kind of vertical search device and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310487578.XA CN103530380B (en) 2013-10-17 2013-10-17 A kind of vertical search device and method

Publications (2)

Publication Number Publication Date
CN103530380A CN103530380A (en) 2014-01-22
CN103530380B true CN103530380B (en) 2017-10-17

Family

ID=49932389

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310487578.XA Active CN103530380B (en) 2013-10-17 2013-10-17 A kind of vertical search device and method

Country Status (1)

Country Link
CN (1) CN103530380B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104881146A (en) * 2015-04-28 2015-09-02 北京美秒科技有限公司 Picture input method
CN106649254A (en) * 2015-11-04 2017-05-10 北京国双科技有限公司 Keyword analysis method and device
CN107784027A (en) * 2016-08-31 2018-03-09 北京国双科技有限公司 A kind of reminding method and device of judgement document's search key
CN110765262A (en) * 2019-09-24 2020-02-07 北京嘀嘀无限科技发展有限公司 POI text retrieval method and device and electronic equipment

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101192108B (en) * 2007-03-28 2010-06-23 腾讯科技(深圳)有限公司 Chinese phonetic input method and system
CN101082936A (en) * 2007-06-29 2007-12-05 中兴通讯股份有限公司 Data enquiring system and method
CN101539428A (en) * 2009-04-28 2009-09-23 北京四维图新科技股份有限公司 Searching method with first letter of pinyin and intonation in navigation system and device thereof

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"基于ORACLE/EJB的汉字模糊同音查询的实现";杨建刚 等;《计算机应用与软件》;20060228;第23卷(第2期);第53-54、71页 *

Also Published As

Publication number Publication date
CN103530380A (en) 2014-01-22

Similar Documents

Publication Publication Date Title
CN104008123B (en) The method and system matched for Chinese Name
JP2003514304A5 (en)
JP2017511914A (en) A method of automatically detecting the meaning of text and automatically measuring uniqueness
US9646512B2 (en) System and method for automated teaching of languages based on frequency of syntactic models
CN103530380B (en) A kind of vertical search device and method
Zhang et al. HANSpeller++: A unified framework for Chinese spelling correction
Zhao et al. A hybrid model for Chinese spelling check
Hanna et al. Analyzing BERT’s knowledge of hypernymy via prompting
Xiong et al. HANSpeller: a unified framework for Chinese spelling correction
Xin et al. An improved graph model for Chinese spell checking
Aliwy Tokenization as preprocessing for Arabic tagging system
KR100798752B1 (en) Apparatus for and method of korean orthography
Ganfure et al. Design and implementation of morphology based spell checker
Xiong et al. Extended HMM and ranking models for Chinese spelling correction
Wang et al. Conditional Random Field-based Parser and Language Model for Tradi-tional Chinese Spelling Checker
Bagul et al. Rule based POS tagger for Marathi text
Duran et al. Some issues on the normalization of a corpus of products reviews in Portuguese
KR101663038B1 (en) Entity boundary detection apparatus in text by usage-learning on the entity's surface string candidates and mtehod thereof
Straka et al. Prague at EPE 2017: the UDPipe system
WO2015075920A1 (en) Input assistance device, input assistance method and recording medium
Dhindsa et al. English to Hindi transliteration system using combination-based approach
CN104239294A (en) Multi-strategy Tibetan long sentence segmentation method for Tibetan to Chinese translation system
Pakray et al. Transliterated search system for Indian languages
JP2008059389A (en) Vocabulary candidate output system, vocabulary candidate output method, and vocabulary candidate output program
Tissot et al. Fast phonetic similarity search over large repositories

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20220727

Address after: Room 801, 8th floor, No. 104, floors 1-19, building 2, yard 6, Jiuxianqiao Road, Chaoyang District, Beijing 100015

Patentee after: BEIJING QIHOO TECHNOLOGY Co.,Ltd.

Address before: 100088 room 112, block D, 28 new street, new street, Xicheng District, Beijing (Desheng Park)

Patentee before: BEIJING QIHOO TECHNOLOGY Co.,Ltd.

Patentee before: Qizhi software (Beijing) Co.,Ltd.

TR01 Transfer of patent right