CN103530380B - A kind of vertical search device and method - Google Patents
A kind of vertical search device and method Download PDFInfo
- Publication number
- CN103530380B CN103530380B CN201310487578.XA CN201310487578A CN103530380B CN 103530380 B CN103530380 B CN 103530380B CN 201310487578 A CN201310487578 A CN 201310487578A CN 103530380 B CN103530380 B CN 103530380B
- Authority
- CN
- China
- Prior art keywords
- chinese
- phonetic alphabet
- keyword
- chinese phonetic
- search
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 38
- 230000002452 interceptive effect Effects 0.000 claims description 23
- 238000006243 chemical reaction Methods 0.000 claims description 9
- 201000003144 pneumothorax Diseases 0.000 description 11
- 230000008569 process Effects 0.000 description 6
- 230000008901 benefit Effects 0.000 description 3
- 230000008859 change Effects 0.000 description 2
- 238000004590 computer program Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 241000892865 Heros Species 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 238000001035 drying Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 241000894007 species Species 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/3332—Query translation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/232—Orthographic correction, e.g. spell checking or vowelisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/40—Processing or translation of natural language
- G06F40/53—Processing of non-Latin text
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Document Processing Apparatus (AREA)
Abstract
The invention discloses a kind of vertical search device and method, method for vertical search therein includes:Obtaining at least includes Chinese written language and/or the Chinese phonetic alphabet in keyword, keyword;Chinese written language in keyword is converted to the Chinese phonetic alphabet of correlation;The search result matched is searched in vertical search information bank according to the corresponding Chinese phonetic alphabet of keyword, the vertical search information bank at least includes the index entry and the corresponding search result of each index entry of Chinese phonetic alphabet form.Pass through vertical search device and method provided in an embodiment of the present invention, it is possible to increase the degree of accuracy of vertical search.
Description
Technical field
The present invention relates to vertical search technical field, and in particular to one kind is used for vertical search equipment, and a kind of vertical
Searching method.
Background technology
Vertical search engine is the professional search engine for some industry, is the subdivision and extension of search engine, is
The information special to certain class in web page library is once integrated, orientation point field extract needs data handled after again
User is returned to some form.
When user carries out vertical search, the keyword of input is probably Chinese written language, it is also possible to the Chinese phonetic alphabet.
If the keyword of user's input is Chinese written language, way general at present is:Directly according to Chinese written language in letter
Breath is scanned in storehouse, but if the keyword of user's input is wrong word, then often search for less than the desired knot of user
Really.Such as user originally should in game vertical search engine input search keyword " seven male ", and then search that " seven heros are striven
This game result of despot ", but because of various reasons, what user may input is wrong word word " pneumothorax ", is pressed in this case
" seven male contentions for hegemony " desired game result of this user can not be just searched according to way general at present.
If the keyword of user's input is the Chinese phonetic alphabet, way general at present is:The Chinese phonetic alphabet that user is inputted
The interface of some general phonetic search is submitted to, corresponding Chinese-character words are converted into by the interface(It is one or more), then
The Chinese-character words of return scan for as search keyword and provide result.It was found by the inventors of the present invention that due to Chinese character with
And the phonetically similar word of word, homonym are relatively more, such user is after input Pinyin, by the conversion of general phonetic searching interface
The unisonance word come also compares many, the search of these words and current vertical be related to field content may not necessarily match completely or
With inaccurate, and then cause when being scanned for these keywords, result may not be that user is desired.Such as, user
When field of play carries out vertical search, phonetic " qixiong " is have input, submitting to may return after general phonetic searching interface
" pneumothorax ", the word such as " neat chest ", and user's expectation can not be searched in the vertical search of field of play using these keywords
Result " seven male contention for hegemony ".
The content of the invention
There is provided a kind of vertical search for overcoming above mentioned problem or solving the above problems at least in part in view of the above problems
Equipment and corresponding method for vertical search.
According to one embodiment of the invention there is provided a kind of vertical search equipment, including:Interactive interface, is configured as obtaining
Taking at least includes Chinese written language and/or the Chinese phonetic alphabet in keyword, the keyword;Converter, is configured as via described
Chinese written language in the keyword that interactive interface is got is converted to the Chinese phonetic alphabet of correlation;And searcher, it is configured as root
The search result matched, the vertical search information are searched in vertical search information bank according to the corresponding Chinese phonetic alphabet of keyword
Storehouse at least includes the index entry and the corresponding search result of each index entry of Chinese phonetic alphabet form.
Optionally, converter is additionally configured to will act as the Chinese written language of corresponding data index information and is converted to the Chinese of correlation
Language phonetic, the search equipment also includes:Constructor is indexed, is configured as the above-mentioned Chinese phonetic alphabet being added to the vertical search
Information bank as corresponding data index entry.
Optionally, the converter includes:Direct modular converter, is configured as according between Chinese written language and the Chinese phonetic alphabet
Corresponding relation, Chinese written language is converted to some related Chinese phonetic alphabet by and preset participle mode.
Optionally, the converter also includes:Fuzzy phoneme modular converter, is configured as according between the different Chinese phonetic alphabet
Fuzzy phoneme corresponding relation, obtains other Chinese for having fuzzy phoneme corresponding relation with the Chinese phonetic alphabet that the direct modular converter is provided
Language phonetic, other described Chinese phonetic alphabet also serve as the related Chinese phonetic alphabet of the Chinese written language.
Optionally, the searcher is additionally configured to being converted to the Chinese written language in keyword into the Chinese phonetic alphabet of correlation
Before, the search result matched is searched in vertical search information bank according to the Chinese written language, if searched, directly
Return to search result.
According to a further embodiment of the invention, a kind of method for vertical search is additionally provided, including:Keyword is obtained, it is described
At least include Chinese written language and/or the Chinese phonetic alphabet in keyword;Chinese written language in the keyword is converted to the Chinese of correlation
Language phonetic;The search result matched is searched in vertical search information bank according to the corresponding Chinese phonetic alphabet of keyword, it is described to hang down
Straight search information bank at least includes the index entry and the corresponding search result of each index entry of Chinese phonetic alphabet form.
Optionally, in addition to:The Chinese written language that will act as corresponding data index information is converted to the Chinese phonetic alphabet of correlation;With
And the index entry using the Chinese phonetic alphabet added to the vertical search information bank as corresponding data.
Optionally, it is described by Chinese written language be converted to correlation the Chinese phonetic alphabet the step of include:According to Chinese written language and the Chinese
Corresponding relation between language phonetic, and preset participle, permutation and combination method, some related Chinese are converted to by Chinese written language
Language phonetic.
Optionally, it is described by Chinese written language be converted to correlation the Chinese phonetic alphabet the step of also include:Spelled according to different Chinese
Fuzzy phoneme corresponding relation between sound, obtains other Chinese phonetic alphabet for having fuzzy phoneme corresponding relation with the above-mentioned Chinese phonetic alphabet, institute
State other Chinese phonetic alphabet and also serve as the related Chinese phonetic alphabet of the Chinese written language.
Optionally, before the Chinese written language in the keyword to be converted to the Chinese phonetic alphabet step of correlation, in addition to:
The search result matched is searched for according to the Chinese written language in the vertical search information bank, if searched, directly
Return to search result.
The method for vertical search and equipment provided according to embodiments of the present invention, due to the Chinese written language in keyword being changed
For the Chinese phonetic alphabet, and it is the index entry that each search result adds Chinese phonetic alphabet form in vertical search information bank, so that
Obtaining can scan for according to the Chinese phonetic alphabet in vertical search information bank;And/or, the Chinese phonetic alphabet directly in keyword
Scan for, and if then make it that the keyword of user's input is wrong, such as will " seven male " mistake it is defeated into " pneumothorax ", can also search
Rope is to correct search result " seven male contention for hegemony ";Or if user's input " qixiong ", can also search the result striven for
" seven male contentions for hegemony ", can not be searched for without words such as " pneumothoraxs ", " neat chest " because being returned using general phonetic searching interface
The problem of to correct result.
Described above is only the general introduction of technical solution of the present invention, in order to better understand the technological means of the present invention,
And can be practiced according to the content of specification, and in order to allow above and other objects of the present invention, feature and advantage can
Become apparent, below especially exemplified by the embodiment of the present invention.
Brief description of the drawings
By reading the detailed description of hereafter preferred embodiment, various other advantages and benefit is common for this area
Technical staff will be clear understanding.Accompanying drawing is only used for showing the purpose of preferred embodiment, and is not considered as to the present invention
Limitation.And in whole accompanying drawing, identical part is denoted by the same reference numerals.In the accompanying drawings:
Fig. 1 shows search equipment schematic diagram according to an embodiment of the invention;
Fig. 2 shows searching method flow chart according to an embodiment of the invention.
Embodiment
The exemplary embodiment of the disclosure is more fully described below with reference to accompanying drawings.Although showing the disclosure in accompanying drawing
Exemplary embodiment, it being understood, however, that may be realized in various forms the disclosure without should be by embodiments set forth here
Limited.On the contrary, these embodiments are provided to facilitate a more thoroughly understanding of the present invention, and can be by the scope of the present disclosure
Complete conveys to those skilled in the art.
Referring to Fig. 1, it is the search equipment schematic diagram according to one embodiment of the invention.The search equipment can include
Interactive interface 102, converter 104, searcher 106, show interface 108, index constructor 110 and vertical search information bank
112。
Interactive interface 102 is to search between equipment and user to be used for the interface for carrying out information exchange, such as obtains user defeated
The keyword entered.General each vertical search has oneself corresponding interactive interface 102, every pass inputted in the interactive interface
Keyword, that is, be considered as to need to carry out information search in corresponding vertical field.Common vertical search field have field of play,
Air ticket field, shopping area etc., after user inputs some keywords in the interactive interface 102 of field of play, subsequently will
Corresponding result is searched for for user in the information bank in this vertical field of playing, and then user inputs in the interactive interface 102
Keyword, vertical search keyword can be referred to as.
The keyword species for user's input that interactive interface 102 is received is a lot, may include Chinese written language form, than
The Chinese written language included in such as " pneumothorax ", " star ", keyword is probably a Chinese character, it is also possible to which two or more Chinese character is constituted
Word, the embodiment of the present invention is collectively termed as Chinese written language;May also be comprising Chinese phonetic alphabet form, such as " qixiong ",
" xingji " etc., it is also possible to have other forms.In embodiments of the present invention, spelled primarily directed to Chinese written language form or Chinese
The keyword of sound form carries out innovative processing.Introduced respectively below for both of these case.
The first, if the vertical search keyword that interactive interface 102 is received includes Chinese written language, then follow-up place
Reason is as follows:
First, the Chinese written language in keyword is supplied to converter 104 by interactive interface 102.Converter 104 is by the Chinese
Text conversion is the related Chinese phonetic alphabet.In transfer process, there are a variety of specific implementations.Such as, converter 104 can include
Direct modular converter 1042, optionally, can also include fuzzy phoneme modular converter 1044.
Specifically, direct modular converter 1042 is according to the corresponding relation between Chinese written language and the Chinese phonetic alphabet, and in advance
Participle, the permutation and combination method put, some related Chinese phonetic alphabet are converted to by Chinese written language.Chinese written language and the Chinese phonetic alphabet it
Between be that such as Chinese written language " seven is male " corresponding Chinese phonetic alphabet is exactly " qixiong ", Chinese language in the presence of fixed corresponding relation
The corresponding Chinese phonetic alphabet of word " gas " is exactly " qi " etc., therefore, it can directly correspond to according to Chinese written language and the Chinese phonetic alphabet and closes
System, is converted to the corresponding Chinese phonetic alphabet by the Chinese written language in keyword, such as is converted to keyword " pneumothorax contention for hegemony "
“qixiongzhengba”。
It should be noted that if the keyword of user's input is multiple Chinese characters, then in order to more accurately scan for,
Chinese written language in keyword can also be converted to by multiple related Chinese phonetic alphabet according to preset participle mode.The side of participle
Formula can have many kinds, still by taking the keyword of foregoing " pneumothorax contention for hegemony " as an example, such as can once be divided with each Chinese character
Word, such as cutting are " qi " " xiong " " Zheng " " ba ";Can also each two Chinese character carry out a participle, such as cutting is
“qixiong”“zhengba”;Can also laddering participle from front to back, such as cutting is " qi " " qixiong "
“qixiongzheng”“qixiongzhengba”;Can also laddering participle from back to front in turn, such as cutting is " ba "
“zhengba”“xiongzhengba”“qixiongzhegnba”.If in addition, it is also desirable that obtaining more preferable, more full search knot
Really, the word after cutting can also be carried out to various permutation and combination, such as permutation and combination goes out " qizheng " " qiba " " xiongba " again
Etc..The multiple Chinese phonetic alphabet formed after above-mentioned participle or permutation and combination, can be the Chinese written languages directly to " pneumothorax contention for hegemony "
Participle and/or permutation and combination are first carried out, each Chinese written language is then converted into the corresponding Chinese phonetic alphabet again;Can also be first by " gas
Chest is contended for hegemony " " qixiongzhengba " is converted to, participle and/or permutation and combination then are carried out to " qixiongzhengba " again.
Except it is listed above go out participle, in addition to permutation and combination method, also have many kinds, it is impossible to limit.Either include
, or it is unrequited go out various participles, permutation and combination method can be used alone, can also arbitrarily various ways combine one
Rise, the keyword for the Chinese written language form that user is inputted is converted to multiple related Chinese phonetic alphabet.In a word, participle mode, row
Row combination, can according to the actual requirements, search equipment operational capability depending on, the embodiment of the present invention is not limited to this
System.
The correlation technique for being described above direct modular converter 1042 in converter 104 is realized.Optionally, converter
104 in addition to including direct modular converter 1042, in order to further improve search efficiency and the degree of accuracy, can also include fuzzy
Sound modular converter 1044.
Specifically, fuzzy phoneme modular converter 1044 is obtained according to the fuzzy phoneme corresponding relation between the different Chinese phonetic alphabet
The Chinese phonetic alphabet provided with direct modular converter 1042 has other Chinese phonetic alphabet of fuzzy phoneme corresponding relation, is turned by fuzzy phoneme
Other Chinese phonetic phonetics that block 1044 is obtained are changed the mold, similarly as the corresponding related Chinese phonetic alphabet of Chinese written language in keyword.
The generation of fuzzy phoneme be based on many reasons, such as the reason for being accustomed in certain areas based on dialect, some people be hard to tell pre-nasal sound,
Nasal sound afterwards, such as " in " and " ing " regardless of, " an " and " ang " regardless of;And for example, somebody is bad based on the grasp to phonetic, easily
" z " and " zh ", " s " and " sh ", " r " and " l ", " l " and " n " etc. is obscured;Also other a variety of causes, but essence is all
Identical, is that possible be obscured two kinds of different phonetics.For this situation, the correspondence pass of fuzzy phoneme can be obtained
System, such as by " in "=" ing ", " an "=" ang ", " z "=" zh ", " r "=" l ".
Assuming that certain user is original will to search for a game for " space craft " originally, but it is due to that he is hard to tell " xin "
" xing ", therefore the Chinese phonetic alphabet knocked in when being typewrited with Chinese phonetic alphabet input method is " xinjizhengba ", and then input
Keyword be Chinese written language " new border contention for hegemony ", in this case, pass through direct modular converter 1042 obtain related Chinese spell
Cent is not " xin " " ji " " Zheng " " ba " " xiji " " xinjizheng " " xinjizhengba " " zhengba ".If turned
Also include fuzzy phoneme modular converter 1044 in parallel operation 104, then fuzzy phoneme modular converter 1044 can be to direct modular converter
Each Chinese phonetic alphabet of 1042 outputs, other related Chinese audios, such as basis are derived according to the corresponding relation of fuzzy phoneme again
Corresponding relation between " in " and " ing ", " xing " is derived by " xin ", similarly, and " xiji " derives " xingji ",
" xinjizheng " is derived " xingjizheng " etc..Then, the keyword " new border contention for hegemony " of user's input, passes through converter
The conversion of direct modular converter 1042 and fuzzy phoneme modular converter 1044, can obtain multiple related Chinese phonetic alphabet in 104:
“xin”“ji”“Zheng”“ba”“xiji”“xinjizheng”“xinjizhengba”“zhengba”“xing”“xingji”
" xinjizheng " " xinjizhengba " etc..As can be seen here, fuzzy phoneme modular converter 1044 can expand Chinese in keyword
Text conversion into the Chinese phonetic alphabet quantity, increase subsequent searches scope, reduce to a certain extent user because mispronunciation,
The keyword of input error, the situation for leading to not search desired result occurs.
After converter 104 is handled the keyword of Chinese written language form, submit to searcher 106 and handled.
It has previously been mentioned that the keyword that interactive interface 102 is received is probably Chinese written language, it is also possible to the Chinese phonetic alphabet, above first
It is data handling procedure in the case of Chinese written language to be described in kind of situation, and it is the Chinese phonetic alphabet to be introduced below in second of situation
In the case of data handling procedure.
Second, if the vertical search word that interactive interface 102 is received includes the Chinese phonetic alphabet, then subsequent treatment is such as
Under:If what interactive interface 102 was received has been the keyword of Chinese phonetic alphabet form, then interactive interface 102 is directly by the Chinese
Language phonetic is supplied to searcher 106, without carrying out data conversion by converter 104.
It in summary it can be seen, no matter in the case of the first that keyword is Chinese written language, or in keyword be Chinese
In the case of second of phonetic, the Chinese phonetic alphabet that correlation finally can be all provided to searcher 106 carries out result search.
After the Chinese phonetic alphabet that searcher 106 obtains correlation from converter 104 or interactive interface 102, spelled according to Chinese
Sound searches for the search result matched in vertical search information bank.Specifically, vertical search information bank at least includes Chinese
The index entry of PINYIN form and the corresponding search result of each index entry., will also be by when constructing vertical search information bank
Converter 104 and index constructor 110.
First, search equipment can collect by all means can as search result various data, such as webpage
Data, document data etc..In order to quickly navigate to corresponding data, certain index typically can be all set for these data
Information, such as can be title, the label of corresponding data(Such as lie fallow, stimulate), title, brief introduction, summary etc., as long as can be with
For identifying some data, the index information that can serve as corresponding data with certain mark action, these index informations,
Point to corresponding data.And then, according to index information, it is possible to find corresponding data, i.e. search result.This index also leads to
Frequently referred to inverted index.
Because most of data are all Chinese written language forms, therefore it is Chinese written language that corresponding index information is also mostly, this
, it is necessary to which by converter 104 by the Chinese written language in various data indexing informations, the Chinese for being converted to correlation is spelled in the case of kind
Sound.Duplicated to the Chinese written language in keyword is converted into the related Chinese phonetic alphabet previously by converter 104, it is vertical building
Similar data handling procedure is also taken by the Chinese written language in the index information for the various data being collected into during search information bank,
Be converted to the Chinese phonetic alphabet of correlation.Specifically, direct modular converter 1042 is according to pair between Chinese written language and the Chinese phonetic alphabet
It should be related to, and preset participle, permutation and combination method, Chinese written language is converted into some related Chinese phonetic alphabet, it is specific to turn
Process is changed, with reference to the description previously for direct modular converter 1042, here is omitted.
For example, when certain game data to be introduced to the vertical search information bank of search engine, first the index of the game
The Chinese character of information, such as game name is converted to phonetic, then carries out participle and/or permutation and combination to these phonetics again, is used in combination
Space etc. point lexicon link(Letter is typically given tacit consent to according to space participle in a search engine), form one and spelled comprising some Chinese
The index field of sound, the index field includes some index entries.Then, index constructor 110 spells the Chinese in index field
Sound is added to vertical search information bank as the index entry of corresponding data.
Such as, there is the game of a " seven male contentions for hegemony ", it is assumed that the title " seven male contentions for hegemony " of the game is index information, the index
Information obtains some related Chinese phonetic alphabet, the content of such as index field includes after the conversion of direct modular converter 1042:
“qi xiong Zheng ba qixiong qixiongzheng qixiongzhengba zhengba”.It can be seen that, this game
Index entry at least include eight, this eight index entries all point to " seven male contention for hegemony " this game.If searcher 106 is according to upper
Any one phonetic stated in 8 index entries is scanned for, and can accurately search " seven male contentions for hegemony " this game.
And then, that no matter user inputted in interactive interface 102 is " seven male " or " pneumothorax ", by the place of converter 104
Reason can be converted to the keyword of the Chinese phonetic alphabet form of correlation, such as " qi " " xiong " " qixiong ".And then, searcher
106 can use the keyword " qi " or " xiong " or " qixiong " of Chinese phonetic alphabet form to scan for, and due to vertically searching
The index entry of " seven male contentions for hegemony " this game includes " qi " " xiong " " qixiong " etc., therefore, searcher 106 in rope information bank
The document data of " seven male contentions for hegemony " this game can accurately be searched.As can be seen here, the technical side of the embodiment of the present invention is passed through
Case, even if user have input wrong word, as long as phonetic is correct(Without tone), such as should input " seven is male " but erroneous input
" pneumothorax ", also can accurately be searched " seven male contentions for hegemony " desired by user.
Further, in order to expand vertical search information bank index information it is comprehensive, before can also equally introducing
The fuzzy phoneme modular converter 1044 in face, handling principle is similar with the previously described fuzzy phoneme processing to keyword, only mistake
Journey in turn, i.e., according to the fuzzy phoneme corresponding relation between the different Chinese phonetic alphabet, is obtained after being changed with direct modular converter 1042
The corresponding fuzzy phoneme of the index information Chinese phonetic alphabet, the corresponding Chinese of index information that direct modular converter 1042 is obtained spells
Sound, and the corresponding Chinese phonetic alphabet of index information that fuzzy phoneme modular converter 1044 is obtained, all typing vertical search information bank, altogether
With the index information as a certain search result.
By the above, due in vertical search information bank, including many search results, and each search for
As a result the index entry of Chinese phonetic alphabet form is all included, therefore searcher 106 is according to the corresponding Chinese phonetic alphabet of keyword, can hang down
The search result matched is searched in straight search information bank.
Optionally, in order to improve efficiency, before the Chinese written language in keyword to be converted to the Chinese phonetic alphabet of correlation, search
Rope device 106 can first search for the search result matched according to Chinese written language in vertical search information bank, if searched,
Directly return to search result, it is not necessary to be reconverted into the Chinese phonetic alphabet and scan for.Certainly, if for search more comprehensively, also may be used
To be carried out in the lump according to Chinese written language search with being searched for according to the Chinese phonetic alphabet.In addition, it is some in particular cases, be also not excluded for only
The possibility searched for according to the Chinese phonetic alphabet.
Optionally, if the keyword inherently Chinese phonetic alphabet, without the conversion by converter 104, searcher 106
Directly scanned for according to the keyword of Chinese phonetic alphabet form.
Optionally, if the existing Chinese written language of keyword of user's input, has the Chinese phonetic alphabet, then for Chinese written language again
Part can be converted to the Chinese phonetic alphabet using converter 104, after the conversion that then searcher 106 is provided according to converter 104
The Chinese phonetic alphabet and the Chinese phonetic alphabet keyword of user's input, are scanned in vertical search information bank in the lump.
It is corresponding with above search equipment, the embodiment of the invention also discloses a kind of searching method, specifically include following several
Individual step:
Step S210:Obtaining at least includes Chinese written language and/or the Chinese phonetic alphabet in keyword, keyword.It is appreciated that with
The keyword for being desired with vertical search of family input can be Chinese written language, or the simply Chinese phonetic alphabet or Chinese
Word and the Chinese phonetic alphabet are inputted in the lump.This step can specifically be realized by interactive interface 102 hereinbefore, related technology
Feature refers to the corresponding description of interactive interface 102 above, and here is omitted.
Step S220:Chinese written language in keyword is converted to the Chinese phonetic alphabet of correlation.If in step S210 only
The Chinese phonetic alphabet is have input, Chinese written language is not inputted, then need not handle this step.Only when there is the Chinese in the keyword of input
During Chinese language word, just need to perform this step.This step can specifically be realized by converter 104 hereinbefore, for example, according to
Corresponding relation between Chinese written language and the Chinese phonetic alphabet, and preset participle, permutation and combination method, Chinese written language is converted to
Some related Chinese phonetic alphabet.Or further, according to the fuzzy phoneme corresponding relation between the different Chinese phonetic alphabet, obtain with before
Stating the Chinese phonetic alphabet has other Chinese phonetic alphabet of fuzzy phoneme corresponding relation, and it is related that other Chinese phonetic alphabet also serve as the Chinese written language
The Chinese phonetic alphabet.In other words, the Chinese written language in keyword is directly converted into the corresponding Chinese phonetic alphabet, or further
By the corresponding relation of fuzzy phoneme, more Chinese phonetic alphabet are changed out.The phase of converter 104 before related technical characteristic is referred to
It should describe, here is omitted.
Step S230:The search knot matched is searched in vertical search information bank according to the corresponding Chinese phonetic alphabet of keyword
Really, vertical search information bank at least includes the index entry and the corresponding search result of each index entry of Chinese phonetic alphabet form.Tool
For body, the Chinese written language that can will act as corresponding data index information is converted to the Chinese phonetic alphabet of correlation, this transfer process
Chinese phonetic alphabet principle is identical with being converted into keyword in step S220, and simply process is on the contrary, so repeat no more.Then, then
Index entry using the Chinese phonetic alphabet added to vertical search information bank as corresponding data.This step can pass through search hereinbefore
Device 106, index constructor 110, vertical search information bank 112 realize that related technical characteristic refers to these modules
Corresponding description, here is omitted.
Optionally, perform step S220 by the Chinese written language in keyword be converted to correlation Chinese phonetic alphabet step it
Before, the search result matched can also be first searched for according to the Chinese written language in vertical search information bank, if searched,
Then directly return to search result.If search is less than then performing step S220 and S230, and then return to search result.
In summary, by method for vertical search provided in an embodiment of the present invention and equipment, due to by the Chinese in keyword
Chinese language word is converted to the Chinese phonetic alphabet, and is the index that each search result adds Chinese phonetic alphabet form in vertical search information bank
, so that being scanned for according to the Chinese phonetic alphabet in vertical search information bank;And/or, directly according in keyword
The Chinese phonetic alphabet scan for, and if then make it that the keyword of user's input is wrong, such as will " seven male " mistake it is defeated into " gas
Chest ", can also search correct search result " seven male contentions for hegemony ";Or if user's input " qixiong ", can also be searched
The result " seven male contentions for hegemony " striven for, without words such as " pneumothoraxs ", " neat chest " because being returned using general phonetic searching interface
The problem of language can not search correct result.
Further, during the Chinese written language in keyword is converted to the Chinese phonetic alphabet, it can not only use and directly turn
Change, it is also contemplated that fuzzy phoneme, similarly, be also contemplated for when vertical search information bank builds the Chinese phonetic alphabet of document data index entry
Fuzzy phoneme, can further increase the comprehensive of search, so as to improve the accuracy rate of search result.
Further, first directly it can be scanned for using Chinese written language, search is corresponding less than being used again after Suitable results
The Chinese phonetic alphabet search, so as to further improve search efficiency.
Algorithm and display be not inherently related to any certain computer, virtual system or miscellaneous equipment provided herein.
Various general-purpose systems can also be used together with based on teaching in this.As described above, construct required by this kind of system
Structure be obvious.In addition, the present invention is not also directed to any certain programmed language.It is understood that, it is possible to use it is various
Programming language realizes the content of invention described herein, and the description done above to language-specific is to disclose this hair
Bright preferred forms.
In the specification that this place is provided, numerous specific details are set forth.It is to be appreciated, however, that the implementation of the present invention
Example can be put into practice in the case of these no details.In some instances, known method, structure is not been shown in detail
And technology, so as not to obscure the understanding of this description.
Similarly, it will be appreciated that in order to simplify the disclosure and help to understand one or more of each inventive aspect, exist
Above in the description of the exemplary embodiment of the present invention, each feature of the invention is grouped together into single implementation sometimes
In example, figure or descriptions thereof.However, the method for the disclosure should be construed to reflect following intention:It is i.e. required to protect
The application claims of shield features more more than the feature being expressly recited in each claim.More precisely, such as following
Claims reflect as, inventive aspect is all features less than single embodiment disclosed above.Therefore,
Thus the claims for following embodiment are expressly incorporated in the embodiment, wherein each claim is in itself
All as the separate embodiments of the present invention.
Those skilled in the art, which are appreciated that, to be carried out adaptively to the module in the equipment in embodiment
Change and they are arranged in one or more equipment different from the embodiment.Can be the module or list in embodiment
Member or component be combined into a module or unit or component, and can be divided into addition multiple submodule or subelement or
Sub-component.In addition at least some in such feature and/or process or unit exclude each other, it can use any
Combination is to this specification(Including adjoint claim, summary and accompanying drawing)Disclosed in all features and so disclosed appoint
Where all processes or unit of method or equipment are combined.Unless expressly stated otherwise, this specification(Including adjoint power
Profit requires, made a summary and accompanying drawing)Disclosed in each feature can be by providing the alternative features of identical, equivalent or similar purpose come generation
Replace.
Although in addition, it will be appreciated by those of skill in the art that some embodiments described herein include other embodiments
In included some features rather than further feature, but the combination of the feature of be the same as Example does not mean in of the invention
Within the scope of and form different embodiments.For example, in the following claims, times of embodiment claimed
One of meaning mode can be used in any combination.
The present invention all parts embodiment can be realized with hardware, or with one or more processor run
Software module realize, or realized with combinations thereof.It will be understood by those of skill in the art that can use in practice
Microprocessor or digital signal processor(DSP)Come realize in search equipment according to embodiments of the present invention some or it is complete
The some or all functions of portion's part.The present invention be also implemented as a part for performing method as described herein or
Person whole equipment or program of device(For example, computer program and computer program product).It is such to realize the present invention's
Program can be stored on a computer-readable medium, or can have the form of one or more signal.Such signal
It can download and obtain from internet website, either provide or provided in any other form on carrier signal.
It should be noted that the present invention will be described rather than limits the invention for above-described embodiment, and ability
Field technique personnel can design alternative embodiment without departing from the scope of the appended claims.In the claims,
Any reference symbol between bracket should not be configured to limitations on claims.Word "comprising" is not excluded the presence of not
Element or step listed in the claims.Word "a" or "an" before element does not exclude the presence of multiple such
Element.The present invention can be by means of including the hardware of some different elements and coming real by means of properly programmed computer
It is existing.In if the unit claim of equipment for drying is listed, several in these devices can be by same hardware branch
To embody.The use of word first, second, and third does not indicate that any order.These words can be explained and run after fame
Claim.
Claims (6)
1. a kind of vertical search equipment, including:
Interactive interface, corresponding with vertical search equipment, being configured as obtaining at least includes Chinese language in keyword, the keyword
Word and the Chinese phonetic alphabet, the keyword are that the vertical search for carrying out information search in the corresponding vertical field of vertical search equipment is closed
Keyword;
Converter, the Chinese written language being configured as in the keyword that will be got via the interactive interface is converted to the Chinese of correlation
Language phonetic;The Chinese written language that the converter is additionally configured to will act as corresponding data index information is converted to related Chinese spelling
Sound;
The converter includes:Fuzzy phoneme modular converter, is configured as being closed according to the fuzzy phoneme correspondence between the different Chinese phonetic alphabet
System, obtains other Chinese phonetic alphabet for having fuzzy phoneme corresponding relation to the related Chinese phonetic alphabet, other described Chinese phonetic alphabet
Also serve as the Chinese phonetic alphabet related to the Chinese written language;
Searcher, the Chinese phonetic alphabet that the Chinese phonetic alphabet or the keyword include for being configured as being produced according to the conversion exists
The search result matched is searched in vertical search information bank, the vertical search information bank at least includes Chinese phonetic alphabet form
Index entry and the corresponding search result of each index entry;
Constructor is indexed, the rope added to the vertical search information bank as corresponding data using the above-mentioned Chinese phonetic alphabet is configured as
Draw item.
2. searching for equipment as claimed in claim 1, the converter includes:
Direct modular converter, is configured as according to the corresponding relation between Chinese written language and the Chinese phonetic alphabet, and preset participle
Mode, some related Chinese phonetic alphabet are converted to by Chinese written language.
3. search equipment as claimed in claim 1 or 2, the searcher is additionally configured to by the Chinese written language in keyword
Before the Chinese phonetic alphabet for being converted to correlation, the search knot matched is searched in vertical search information bank according to the Chinese written language
Really, if searched, search result is directly returned.
4. a kind of method for vertical search, including:
Interactive interface corresponding with vertical search equipment, which is obtained, at least includes Chinese written language and Chinese in keyword, the keyword
Phonetic, the keyword is to carry out the vertical search keyword of information search in the corresponding vertical field of vertical search equipment;
Chinese written language in the keyword got via the interactive interface is converted to the Chinese phonetic alphabet of correlation, and
The Chinese written language that will act as corresponding data index information is converted to the Chinese phonetic alphabet of correlation;
According to the fuzzy phoneme corresponding relation between the different Chinese phonetic alphabet, obtain that there is fuzzy phoneme pair to the related Chinese phonetic alphabet
Other Chinese phonetic alphabet that should be related to, other described Chinese phonetic alphabet also serve as the Chinese phonetic alphabet related to the Chinese written language;
Using the Chinese phonetic alphabet added to the vertical search information bank as corresponding data index entry, according to the Chinese
The Chinese phonetic alphabet that the related Chinese phonetic alphabet of word or the keyword include is searched in vertical search information bank to match
Search result, the vertical search information bank at least including Chinese phonetic alphabet form index entry and each index entry it is corresponding
Search result.
5. searching method as claimed in claim 4, it is described Chinese written language is converted to correlation the Chinese phonetic alphabet the step of include:
According to the corresponding relation between Chinese written language and the Chinese phonetic alphabet, and preset participle, permutation and combination method, by Chinese language
Word is converted to some related Chinese phonetic alphabet.
6. the searching method as described in claim 4 or 5, the Chinese written language in the keyword is being converted to the Chinese of correlation
Before phonetic step, in addition to:The search knot matched is searched for according to the Chinese written language in the vertical search information bank
Really, if searched, search result is directly returned.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310487578.XA CN103530380B (en) | 2013-10-17 | 2013-10-17 | A kind of vertical search device and method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310487578.XA CN103530380B (en) | 2013-10-17 | 2013-10-17 | A kind of vertical search device and method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103530380A CN103530380A (en) | 2014-01-22 |
CN103530380B true CN103530380B (en) | 2017-10-17 |
Family
ID=49932389
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310487578.XA Active CN103530380B (en) | 2013-10-17 | 2013-10-17 | A kind of vertical search device and method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103530380B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104881146A (en) * | 2015-04-28 | 2015-09-02 | 北京美秒科技有限公司 | Picture input method |
CN106649254A (en) * | 2015-11-04 | 2017-05-10 | 北京国双科技有限公司 | Keyword analysis method and device |
CN107784027A (en) * | 2016-08-31 | 2018-03-09 | 北京国双科技有限公司 | A kind of reminding method and device of judgement document's search key |
CN110765262A (en) * | 2019-09-24 | 2020-02-07 | 北京嘀嘀无限科技发展有限公司 | POI text retrieval method and device and electronic equipment |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101192108B (en) * | 2007-03-28 | 2010-06-23 | 腾讯科技(深圳)有限公司 | Chinese phonetic input method and system |
CN101082936A (en) * | 2007-06-29 | 2007-12-05 | 中兴通讯股份有限公司 | Data enquiring system and method |
CN101539428A (en) * | 2009-04-28 | 2009-09-23 | 北京四维图新科技股份有限公司 | Searching method with first letter of pinyin and intonation in navigation system and device thereof |
-
2013
- 2013-10-17 CN CN201310487578.XA patent/CN103530380B/en active Active
Non-Patent Citations (1)
Title |
---|
"基于ORACLE/EJB的汉字模糊同音查询的实现";杨建刚 等;《计算机应用与软件》;20060228;第23卷(第2期);第53-54、71页 * |
Also Published As
Publication number | Publication date |
---|---|
CN103530380A (en) | 2014-01-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104008123B (en) | The method and system matched for Chinese Name | |
JP2003514304A5 (en) | ||
JP2017511914A (en) | A method of automatically detecting the meaning of text and automatically measuring uniqueness | |
US9646512B2 (en) | System and method for automated teaching of languages based on frequency of syntactic models | |
CN103530380B (en) | A kind of vertical search device and method | |
Zhang et al. | HANSpeller++: A unified framework for Chinese spelling correction | |
Zhao et al. | A hybrid model for Chinese spelling check | |
Hanna et al. | Analyzing BERT’s knowledge of hypernymy via prompting | |
Xiong et al. | HANSpeller: a unified framework for Chinese spelling correction | |
Xin et al. | An improved graph model for Chinese spell checking | |
Aliwy | Tokenization as preprocessing for Arabic tagging system | |
KR100798752B1 (en) | Apparatus for and method of korean orthography | |
Ganfure et al. | Design and implementation of morphology based spell checker | |
Xiong et al. | Extended HMM and ranking models for Chinese spelling correction | |
Wang et al. | Conditional Random Field-based Parser and Language Model for Tradi-tional Chinese Spelling Checker | |
Bagul et al. | Rule based POS tagger for Marathi text | |
Duran et al. | Some issues on the normalization of a corpus of products reviews in Portuguese | |
KR101663038B1 (en) | Entity boundary detection apparatus in text by usage-learning on the entity's surface string candidates and mtehod thereof | |
Straka et al. | Prague at EPE 2017: the UDPipe system | |
WO2015075920A1 (en) | Input assistance device, input assistance method and recording medium | |
Dhindsa et al. | English to Hindi transliteration system using combination-based approach | |
CN104239294A (en) | Multi-strategy Tibetan long sentence segmentation method for Tibetan to Chinese translation system | |
Pakray et al. | Transliterated search system for Indian languages | |
JP2008059389A (en) | Vocabulary candidate output system, vocabulary candidate output method, and vocabulary candidate output program | |
Tissot et al. | Fast phonetic similarity search over large repositories |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right |
Effective date of registration: 20220727 Address after: Room 801, 8th floor, No. 104, floors 1-19, building 2, yard 6, Jiuxianqiao Road, Chaoyang District, Beijing 100015 Patentee after: BEIJING QIHOO TECHNOLOGY Co.,Ltd. Address before: 100088 room 112, block D, 28 new street, new street, Xicheng District, Beijing (Desheng Park) Patentee before: BEIJING QIHOO TECHNOLOGY Co.,Ltd. Patentee before: Qizhi software (Beijing) Co.,Ltd. |
|
TR01 | Transfer of patent right |