CN102890690A - Target information search method and device - Google Patents

Target information search method and device Download PDF

Info

Publication number
CN102890690A
CN102890690A CN 201110207333 CN201110207333A CN102890690A CN 102890690 A CN102890690 A CN 102890690A CN 201110207333 CN201110207333 CN 201110207333 CN 201110207333 A CN201110207333 A CN 201110207333A CN 102890690 A CN102890690 A CN 102890690A
Authority
CN
China
Prior art keywords
character
weights
participle
participle device
character string
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN 201110207333
Other languages
Chinese (zh)
Other versions
CN102890690B (en
Inventor
王�琦
左杨眉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ZTE Corp
Original Assignee
ZTE Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ZTE Corp filed Critical ZTE Corp
Priority to CN201110207333.8A priority Critical patent/CN102890690B/en
Publication of CN102890690A publication Critical patent/CN102890690A/en
Application granted granted Critical
Publication of CN102890690B publication Critical patent/CN102890690B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a target information search method and a target information search device. The method comprises the following steps of: receiving a word segmentation device selected by a user and a character string input by the user, wherein the word segmentation device is matched with the character string input by the user; performing word segmentation on the character string by using the word segmentation device to acquire a search word; and inputting the acquired search word into a search engine, and searching to acquire target information. By the method and the device, the problem of an inaccurate search result of the conventional search engine is solved, convenience is brought to a user, and retrieval quality is improved.

Description

Target information searching method and device
Technical field
The present invention relates to the information search field, in particular to a kind of target information searching method and device.
Background technology
Search engine technique is applied in the various IT system more and more, data in the search engine index storehouse thereby be exponential growth, along with the continuous increase of Chinese character document in index database, increasing Chinese character word enters into index database, all kinds of neologisms and special-purpose vocabulary (such as the term of name or specific area) the participle accuracy rate to the participle device after entering minute dictionary has produced great negative effect, so that a lot of Chinese sentences can't correctly be decomposed according to semanteme, Chinese sentence for example: " ion cloud is concentrated and distributed ", if extra process do not done in technical term " ion cloud ", this Chinese sentence will be decomposed into by the participle device " ion cloud is concentrated and distributed " so, and such word segmentation result can cause search engine can't search the data of user's expectation.
As seen, present way of search also can't be carried out participle according to user's search target, causes word segmentation result and user's retrieval purpose not to be inconsistent; In addition, above-mentioned word segmentation result is comprehensive not, so that some crucial search condition can't be extracted from the character string of user's input.
There is the inaccurate problem of Search Results for search engine in the correlation technique, not yet proposes at present effective solution.
Summary of the invention
Fundamental purpose of the present invention is to provide a kind of target information searching method and device, has the inaccurate problem of Search Results to solve at least above-mentioned search engine.
According to an aspect of the present invention, provide a kind of target information searching method, comprised the steps: to receive the participle device of user selection and the character string of user's input, wherein, this participle device be the participle device of the string matching inputted with this user; Use this participle device that above-mentioned character string is carried out participle, obtain search terms; The search terms inputted search engine that obtains is searched for, obtained target information.
Before the character string of the participle device of above-mentioned reception user selection and user's input, the method also comprises: use the classifying documents corresponding with technical field to set up participle device corresponding to technical field.
The above-mentioned use classifying documents corresponding with technical field set up participle device corresponding to technical field and comprised: technical field is classified, determine classifying documents corresponding to current classification; According to the frequency that each character in the classifying documents occurs, calculate the weights of each character in current classification; Determine to specify in the current classification weights of character in current classification in the character string; Weights according to each character in the designated character string calculate the weights of designated character string in current classification; Weights in current classification are bound with designated character string and designated character string, obtain the participle device of current classification.
The above-mentioned frequency that occurs according to each character in the classifying documents is calculated the weights of each character in current classification and is comprised: the stop-word in the deletion classifying documents; The frequency that each character occurs in the classifying documents behind the statistics deletion stop-word; The document frequency that comprises character in the statistical classification document; Calculate the weights of each character in current classification according to the frequency of character, the document frequency of character and the sum of classifying documents.
Specify the weights of character in current classification in the character string to comprise in above-mentioned definite current classification: when in the appointment character string in the current classification character that is not included in the classifying documents being arranged, it is default weight that the weights that are not included in the character in the classifying documents are set.
It is one of following that above-mentioned character comprises: the character of the character of hanzi form, Korean form or the character of Japanese form.
According to a further aspect in the invention, provide a kind of target information searcher, comprised such as lower module: receiver module, be used for to receive the participle device of user selection and the character string of user's input, wherein, the participle device be the participle device of the string matching inputted with the user; Word-dividing mode, the participle device that is used for using receiver module to receive carries out participle to character string, obtains search terms; Search module is used for the search terms inputted search engine that word-dividing mode obtains is searched for, and obtains target information.
Said apparatus also comprises: the participle device is set up module, is used for using the classifying documents corresponding with technical field to set up participle device corresponding to technical field.
Above-mentioned participle device is set up module and is comprised: the document determining unit, be used for technical field is classified, and determine classifying documents corresponding to current classification; The character weight calculation unit, the frequency that each character of classifying documents that is used for determining according to the document determining unit occurs is calculated the weights of each character in current classification; The weights determining unit is used for the weights of character in current classification in definite current classification appointment character string; The character string weight calculation unit is used for calculating the weights of designated character string in current classification according to the weights of each character of designated character string; The participle device is set up the unit, is used for designated character string and designated character string are bound at the weights of current classification, obtains the participle device of current classification.
Above-mentioned character weight calculation unit comprises: the deletion subelement, for the stop-word of deletion classifying documents; Add up subelement, for the frequency of each character appearance of classifying documents behind the statistics deletion subelement deletion stop-word, and the document frequency that comprises character in the statistical classification document; The character string computation subunit is used for calculating each character at the weights of current classification according to the frequency of character, the document frequency of character and the sum of classifying documents.
By the present invention, adopt and use the participle device of the string matching of inputting with the user to carry out participle, can from the character string of user's input, extract exactly each word, word behind the use participle is searched for, the target information that obtains will meet user's expectation, solve existing search engine and had the inaccurate problem of Search Results, convenient for users, improved the quality of retrieval.
Description of drawings
Accompanying drawing described herein is used to provide a further understanding of the present invention, consists of the application's a part, and illustrative examples of the present invention and explanation thereof are used for explaining the present invention, do not consist of improper restriction of the present invention.In the accompanying drawings:
Fig. 1 is the process flow diagram according to the target information searching method of the embodiment of the invention 1;
Fig. 2 is the structured flowchart according to the target information searcher of the embodiment of the invention 2;
Fig. 3 is the concrete structure block diagram according to the target information searcher of the embodiment of the invention 2;
Fig. 4 is the concrete structure block diagram according to the target information searcher of the embodiment of the invention 2;
Fig. 5 is the structured flowchart according to the weights generation module of the embodiment of the invention 2;
Fig. 6 is the process flow diagram according to the target information searching method of application drawing 4 shown devices of the embodiment of the invention 2;
Fig. 7 is the process flow diagram according to the target information searching method of application drawing 4 shown devices of the embodiment of the invention 2;
Fig. 8 is the process flow diagram according to the target information searching method of application drawing 4 shown devices of the embodiment of the invention 2;
Fig. 9 is the target information search system schematic diagram according to the embodiment of the invention 2.
Embodiment
Hereinafter also describe in conjunction with the embodiments the present invention in detail with reference to accompanying drawing.Need to prove, in the situation that do not conflict, embodiment and the feature among the embodiment among the application can make up mutually.
The embodiment of the invention considers that present search engine do not retrieve retrieving information according to technical field, cause Search Results inaccurate, a kind of target information searching method and device are provided, this mode can make search engine in different field different participle models, the accuracy that can improve participle be used in difference classification; Be applicable in the fields such as searching engine field, participle field and WEB application system.
Embodiment 1
The present embodiment provides a kind of target information searching method, and referring to Fig. 1, the method comprises the steps:
Step S102 receives the participle device of user selection and the character string of user's input, and wherein, this participle device be the participle device of the string matching inputted with the user;
This coupling refers to that technical field corresponding to technical field corresponding to this participle device and the character string of user's input is consistent;
Step S104 uses above-mentioned participle device that this character string is carried out participle, obtains search terms;
Step S106 searches for the search terms inputted search engine that obtains, and obtains target information.
The present embodiment carries out participle by the participle device that uses the string matching of inputting with the user, can from the character string of user's input, extract exactly each word, word behind the use participle is searched for, the target information that obtains will meet user's expectation, solve existing search engine and had the inaccurate problem of Search Results, convenient for users, improved the quality of retrieval.
In order to improve the accuracy of participle, receive at above-mentioned participle device before the character string of the participle device of user selection and user's input, the method also comprises: use the classifying documents corresponding with technical field to set up participle device corresponding to technical field.
Wherein, using the classifying documents corresponding with technical field to set up participle device corresponding to technical field comprises the steps:
1) technical field is classified, determine classifying documents corresponding to current classification;
2) frequency that occurs according to each character in this classifying documents is calculated the weights of each character in current classification;
3) determine the weights of character in this current classification in the appointment character string in the current classification;
4) weights according to each character in the designated character string calculate the weights of this designated character string in current classification;
5) weights in current classification are bound with designated character string and this designated character string, obtain the participle device of current classification.
The concrete account form of the weights of each character in current classification can adopt: the stop-word in the deletion classifying documents; The frequency that each character occurs in the classifying documents behind the statistics deletion stop-word; The document frequency that comprises character in the statistical classification document; Calculate the weights of each character in current classification according to the frequency of character, the document frequency of character and the sum of classifying documents.Certainly, in actual use, also can not delete the stop-word in the classifying documents, the frequency that directly each character occurs in the statistical classification document.Wherein, this stop-word can be set in advance, such as: article, conjunction or auxiliary word etc.
When in the appointment character string in this current classification the character that is not included in the classifying documents being arranged, it is default weight that these weights that are not included in the character in the classifying documents are set.
It is one of following that above-mentioned character comprises: the character of the character of hanzi form, Korean form or the character of Japanese form.
After establishing participle device corresponding to each technical field, will obtain the relatively participle device of specialty, these participle devices may be displayed on the interface of search engine, for user selection.Take Chinese character as example, the searching method of target information comprises the steps:
Step 1, the document that comprises in the classification is done the Chinese character frequency analysis.
Step 2, the Chinese character frequency that comprises in the classification is done probability distribution process, calculate the weights of Chinese character in classification that comprise in the classification.
Step 3, the weights that comprise in classification according to the Chinese character that comprises in the classification calculate the weights of each word in classification in the participle device dictionary.
Step 4, with in the weights inputs participle device of each word in the participle device dictionary in classification, make the participle device become the special-purpose participle device of classification.
Step 5, the special-purpose participle device that will set up a plurality of classification of finishing offer the user, and the user selects the special-purpose participle device of its retrieval purpose the most suitable from a plurality of special-purpose participle devices, and uses special-purpose participle device to provide Chinese Word Segmentation Service as search engine.
Step 6, user input search condition, and special-purpose participle device carries out word segmentation processing to search condition, and the output word segmentation result, and search engine carries out full-text search with word segmentation result as the retrieval foundation, and result for retrieval is returned to the user.
The user selects the participle device that mates most with its search target and inputs Chinese character string at the WEB of the internet page, this system carries out word segmentation processing by the participle device of user's appointment to Chinese character string, output meets the Chinese-character words of user search purpose most, and Chinese-character words is transferred to search engine process.
The present embodiment can provide for each classification in the document library special-purpose participle device, take Chinese character as example, by the occurrence number of the Chinese character in the classifying documents is done probability statistics, calculate the weights of each Chinese character in classification, and calculate the weights of each Chinese-character words in classification in the participle device dictionary according to the Chinese character weights, and then set up special-purpose participle device for each classification, the user selects to select in the interface the most suitable its to search for the special-purpose participle device of purpose according to its search purpose at the participle device, and utilize professional participle device acquisition for the best word segmentation result of user search purpose, thereby improve the search accuracy rate of search engine, improve the user to the satisfaction of search engine.
Embodiment 2
The present embodiment also provides a kind of target information searcher, and referring to Fig. 2, this device comprises with lower module:
Receiver module 22 be used for to receive the participle device of user selection and the character string of user's input, and wherein, this participle device be the participle device of the string matching inputted with this user;
Word-dividing mode 24 links to each other with receiver module 22, and the participle device that is used for using receiver module 22 to receive carries out participle to character string, obtains search terms;
Search module 26 links to each other with word-dividing mode 24, is used for the search terms inputted search engine that word-dividing mode 24 obtains is searched for, and obtains target information.
The present embodiment carries out participle by the participle device that uses the string matching of inputting with the user, can from the character string of user's input, extract exactly each word, word behind the use participle is searched for, the target information that obtains will meet user's expectation, solve existing search engine and had the inaccurate problem of Search Results, convenient for users, improved the quality of retrieval.
In order to improve the accuracy of participle, referring to Fig. 3, said apparatus also comprises: the participle device is set up module 32, links to each other with receiver module 22, is used for using the classifying documents corresponding with technical field to set up participle device corresponding to technical field.
Wherein, the participle device is set up module 32 and is comprised: the document determining unit, be used for technical field is classified, and determine classifying documents corresponding to current classification; The character weight calculation unit, the frequency that each character of classifying documents that is used for determining according to the document determining unit occurs is calculated the weights of each character in current classification; The weights determining unit is used for the weights of character in current classification in definite current classification appointment character string; The character string weight calculation unit is used for calculating the weights of designated character string in current classification according to the weights of each character of designated character string; The participle device is set up the unit, is used for designated character string and designated character string are bound at the weights of current classification, obtains the participle device of current classification.
Preferably, above-mentioned character weight calculation unit comprises: the deletion subelement, for the stop-word of deletion classifying documents; Add up subelement, for the frequency of each character appearance of classifying documents behind the statistics deletion subelement deletion stop-word, and the document frequency that comprises character in the statistical classification document; The character string computation subunit is used for calculating each character at the weights of current classification according to the frequency of character, the document frequency of character and the sum of classifying documents.
The device that the present embodiment provides, can set up special-purpose participle device for each classification in classifying documents storehouse, the user can select participle device of suitable its query aim from the special-purpose participle device of numerous classification, provide the word segmentation result of suitable its query aim by this participle device for search engine, thereby improve the search precision of search engine.
Above-mentioned character is take Chinese character as example, and the present embodiment also provides another kind of target information searcher, and this device comprises such as lower module:
(1) Chinese character frequency collection module, (2) Chinese character weights computing module, (3) Chinese-character words weights generation module, (4) special-purpose participle device, (5) participle device are selected module, and (6) retrieval request pretreatment module; Wherein, the function of modules is as follows:
The Chinese character frequency collection module calculates the frequency of occurrences of each Chinese character in classification in each classification.
Chinese character weights computing module, take the classification in each Chinese character frequency of occurrences as foundation, calculate the probability of occurrence of each Chinese character in classification in the classification, and frequency carried out normalized, draw the weights of Chinese character in classification.
This Chinese character weights computing module can calculate the weights of all Chinese characters in classification that comprise in the classification according to the frequency of occurrences of all Chinese characters that comprise in the classification.
Wherein, Chinese character frequency collection module and Chinese character weights computing module are equivalent to above-mentioned character weight calculation unit.This Chinese character frequency collection module can be collected the frequency of occurrences of all Chinese characters that comprise in the classification.
Chinese-character words weights generation module, the Chinese character weights in the classification are as foundation, for the word in the participle device dictionary calculates weights in classification.
This Chinese-character words weights generation module can calculate the weights of Chinese-character words in classification in minute dictionary according to the weights of all Chinese characters in classification that comprise in the classification.
Special-purpose participle device, for a general participle device is set up in classification, and the weights of all Chinese-character words that will classify import in the general participle device, make general participle device become the special-purpose participle device of classification, special-purpose participle device with the weights of all Chinese-character words of participle device dictionary and classification as the participle foundation.
As seen, the special-purpose participle device of the present embodiment is to be based upon on the general participle device basis, by input the weights of all Chinese-character words of classification to general participle device, general participle device is changed into the special-purpose participle device of classification, special-purpose participle device with the weights of all Chinese-character words of participle device dictionary and classification as the participle foundation.
The participle device is selected module, and the proprietary participle device of a plurality of classification of having set up is showed the user, and the user selects one from the special-purpose participle device of a plurality of classification, for search engine provides Chinese Word Segmentation Service.
The user selects module can select the special-purpose participle device that mates most with its search purpose by this participle device.
The retrieval request pretreatment module receives the Chinese character string that the user inputs, and Chinese character string is inputted the special-purpose participle device that the user selectes, and the special-purpose participle device of selecting from the user obtains word segmentation result, and word segmentation result is assembled in the querying condition inputted search engine.
Be input as example with Chinese character, the present embodiment provides a kind of target information searcher, and this device can be arranged in the search engine server 40, and referring to Fig. 4, this device is comprised of following several modules:
(1) the weights generation module 41;
(2) special-purpose participle device 42 links to each other with weights generation module 41;
(3) the participle device is selected module 43, links to each other with special-purpose participle device 42;
(4) the retrieval request pretreatment module 44, select module 43 to link to each other with network with the participle device;
(5) search engine 45, link to each other with retrieval request pretreatment module 44;
Wherein, weights generation module 41 is responsible for generating the weights of word in described classification that comprise in the described classification, and referring to Fig. 5, this module comprises three submodules:
1, the Chinese character frequency collection module 411: this module is at first removed the stop-word in the document, the frequency of occurrences of the Chinese character that then comprises in the statistical classification document library (total number of word of Chinese character in the occurrence number of the individual Chinese character that comprises in Chinese character frequency=classification/classification) comprises simultaneously the number of files (hereinafter referred to as document frequency) of Chinese character in the statistical classification.
2, Chinese character weights computing module 412: the Chinese character frequency that this module at first calculates according to Chinese character frequency collection module 411, the total number of documents in document frequency and the classification is calculated the weights of Chinese character in classification; Secondly for being present in minute dictionary but default weight given in the Chinese character that is not present in the classification.
3, Chinese-character words weights generation module 413: the Chinese-character words in the participle device dictionary is taken out one by one, and obtain the weights of Chinese character in classification that Chinese-character words that Chinese character weights computing module 412 calculates comprises according to Chinese-character words, then the weights according to Chinese character in the Chinese-character words calculate the weights of Chinese-character words in classification, at last the weights of Chinese-character words in classification are write hard disk.
Special-purpose participle device 42 is responsible for the Chinese Word Segmentation Service that the user provides specialty, special-purpose participle device 42 can be decomposed into the Chinese-character words that meets user's expectation most with the search condition of user's input, the implementation process of this module is as follows: common participle device of model, then from hard disk, read in the Chinese-character words weights of the classification correspondence that Chinese-character words weights generation module 43 calculates, and Chinese-character words weights and Chinese-character words in minute dictionary bound together, at last special-purpose participle device is registered to the participle device and selects in the module 43; In the participle process, calculate the Chinese-character words combination that meets classification most according to the Chinese-character words weights.
The special-purpose participle device 42 that the participle device selects module 43 to be responsible for establishing shows the user in visual mode, and allow the user to meet the special-purpose participle device 42 of retrieving purpose most by this of module selection, the implementation process of this module is as follows: the special-purpose participle device 42 that at first will establish is saved among the chained list, then select module 43 that a user interface is provided by the participle device, in user interface, the special-purpose participle device 42 in the chained list is displayed, for user selection.The user can only select one of them special-purpose participle device 42 in the chained list, and after user selection was complete, the participle device selected module 43 that the special-purpose participle device 42 that the user selectes is passed to retrieval request pretreatment module 44.
Retrieval request pretreatment module 44 is responsible for receiving the search condition of user's input, the selected special-purpose participle device 42 of invoke user carries out word segmentation processing, and word segmentation result passed to search engine: at first retrieval request pretreatment module 44 receives users' retrieval request, then retrieval request pretreatment module 44 is delivered to the user with retrieval request and selects to carry out word segmentation processing in the selected special-purpose participle device 42 of module 43 by the participle device, and from special-purpose participle device 42, fetch word segmentation result, last retrieval request pretreatment module 44 passes to search engine with word segmentation result as search condition.
Based on the device that Figure 4 and 5 provide, the present embodiment also provides a kind of target information searching method, and referring to target information searching method process flow diagram shown in Figure 6, the method may further comprise the steps:
Step S601, the scanning classifying documents;
Step S602, the frequency of occurrences of statistics Chinese character in this classification;
Step S603, the weights of statistics Chinese character in this classification;
Step S604, the weights of statistics Chinese-character words in this classification;
Step S605 generates special-purpose participle device;
Step S606 is registered to the participle device with special-purpose participle device and selects in the module;
Step S607 judges whether the user selects the participle device; If so, execution in step S608; If not, execution in step S609;
Step S608 is delivered to the retrieval request pretreatment module with the participle device of user selection;
Step S609 waits for user selection participle device;
Step S610 judges whether the user inputs search condition (perhaps claim retrieval request, be equivalent to above-mentioned character string); If so, execution in step S611; If not, execution in step S612;
Step S611, the participle device that invoke user is selected carries out word segmentation processing to retrieval request, and result is passed to search engine as querying condition, then execution in step S613;
Step S612 waits for that the user inputs retrieval request;
Step S613 returns result for retrieval to client.
Referring to Fig. 7 target information searching method process flow diagram, the method may further comprise the steps:
Step S700: Chinese character frequency collection module 411 scanning classifying documents;
Step S701: Chinese character frequency collection module 411 removes the stop-word in the document;
Step S702: the frequency of occurrences of the Chinese character that comprises in the Chinese character frequency collection module 411 statistical classification document library (total number of word of Chinese character in the occurrence number of the individual Chinese character that comprises in Chinese character frequency=classification/classification);
Step S703: the number of files (hereinafter referred to as document frequency) that comprises Chinese character in Chinese character frequency collection module 411 statistical classifications;
Step S704: the Chinese character frequency that Chinese character weights computing module 412 calculates according to Chinese character frequency collection module 411, the total number of documents in document frequency and the classification is calculated the weights of Chinese character in classification;
Step S705: Chinese character weights computing module 412 is for being present in minute dictionary but default weight given in the Chinese character that is not present in the classification;
Step S706: the Chinese character weights that Chinese-character words weights generation module 413 calculates according to Chinese character weights computing module 412 are that the Chinese-character words that comprises Chinese character in the participle device dictionary is given weights;
Step S707: set up a common participle device;
Step S708: from hard disk, read in the Chinese-character words weights of the classification correspondence that Chinese-character words weights generation module 413 calculates, and Chinese-character words weights and Chinese-character words in minute dictionary are bound together;
Step S709: will inject common participle device with minute dictionary of Chinese character weights, and make it to become special-purpose participle device 42;
Step S710: special-purpose participle device is registered to the participle device selects in the module;
Step S711: judge whether that each classification set up special-purpose participle device, if not, repeating step S700 is to S710 until the special-purpose participle device 42 of all class libraries is all set up finishes; If so, finish.
Target information searching method process flow diagram shown in Figure 8, the method may further comprise the steps:
Step S800: the participle device selects module 43 that special-purpose participle device 42 is shown in the user interface.
Step S801: the participle device selects module 43 to wait for user selection participle device 42.
Step S802: the participle device selects module 43 to accept the special-purpose participle device 42 of user selection, and it is recorded.
Step S803: the participle device selects module 43 to send the participle device that the user selectes to retrieval request pretreatment module 44.
Step S804: retrieval request pretreatment module 44 is accepted user's retrieval request, and the selected participle device of invoke user carries out word segmentation processing to retrieval request, and result is passed to search engine 45 as querying condition.
Step S805: search engine 45 is retrieved according to the search condition after the word segmentation processing, and returns result for retrieval.
Step S806: whether the user reselects special-purpose participle device 42, if so, and repeated execution of steps S802; If not, execution in step S807.
Step S807: whether the user re-enters retrieval request, if so, re-executes step S804 and step S805, if not, finishes, if namely the user does not have new activity, business processing flow finishes automatically.
The present embodiment can mark off a plurality of classification according to technical field, target information search system schematic diagram as shown in Figure 9, each classification is the above-mentioned device of a corresponding cover respectively, and wherein, it is utility module that the participle device in this device is selected module, retrieval request pretreatment module and search engine.
The present embodiment can provide for each classification in the document library special-purpose participle device, take Chinese character as example, by the occurrence number of the Chinese character in the classifying documents is done probability statistics, calculate the weights of each Chinese character in classification, and calculate the weights of each Chinese-character words in classification in the participle device dictionary according to the Chinese character weights, and then set up special-purpose participle device for each classification, the user selects to select in the interface the most suitable its to search for the special-purpose participle device of purpose according to its search purpose at the participle device, and utilize professional participle device acquisition for the best word segmentation result of user search purpose, thereby improve the search accuracy rate of search engine, improve the user to the satisfaction of search engine.
As can be seen from the above description, the present invention has realized following technique effect:
1, for the user provides diversified special-purpose participle device, the user is by using the special-purpose participle device that meets most with its search purpose, accuracy that can the Effective Raise participle, and improve on this basis the retrieval accuracy of search engine.
2, the user can select a plurality of participle devices that same search condition is carried out repeatedly word segmentation processing, and each word segmentation result is submitted to separately search engine does retrieval, thereby accurately retrieves the document of user's expectation.
Obviously, those skilled in the art should be understood that, above-mentioned each module of the present invention or each step can realize with general calculation element, they can concentrate on the single calculation element, perhaps be distributed on the network that a plurality of calculation elements form, alternatively, they can be realized with the executable program code of calculation element, thereby, they can be stored in the memory storage and be carried out by calculation element, and in some cases, can carry out step shown or that describe with the order that is different from herein, perhaps they are made into respectively each integrated circuit modules, perhaps a plurality of modules in them or step are made into the single integrated circuit module and realize.Like this, the present invention is not restricted to any specific hardware and software combination.
The above is the preferred embodiments of the present invention only, is not limited to the present invention, and for a person skilled in the art, the present invention can have various modifications and variations.Within the spirit and principles in the present invention all, any modification of doing, be equal to replacement, improvement etc., all should be included within protection scope of the present invention.

Claims (10)

1. a target information searching method is characterized in that comprising the steps:
Receive the participle device of user selection and the character string of described user input, wherein, described participle device be the participle device of the string matching inputted with described user;
Use described participle device that described character string is carried out participle, obtain search terms;
The described search terms inputted search engine that obtains is searched for, obtained target information.
2. method according to claim 1 is characterized in that, receives before the character string of the participle device of described user selection and described user input, and described method also comprises:
Use the classifying documents corresponding with technical field to set up participle device corresponding to described technical field.
3. method according to claim 2 is characterized in that, uses the classifying documents corresponding with technical field to set up participle device corresponding to described technical field and comprises:
Technical field is classified, determine classifying documents corresponding to current classification;
According to the frequency that each character in the described classifying documents occurs, calculate the weights of described each character in described current classification;
Determine to specify in the described current classification weights of character in described current classification in the character string;
Weights according to each character in the described designated character string calculate the weights of described designated character string in described current classification;
Weights in described current classification are bound with described designated character string and described designated character string, obtain the participle device of described current classification.
4. method according to claim 3 is characterized in that, the described frequency that occurs according to each character in the described classifying documents is calculated the weights of described each character in described current classification and comprised:
Delete the stop-word in the described classifying documents;
The frequency that each character occurs in the described classifying documents behind the described stop-word of statistics deletion;
Add up the document frequency that comprises described character in the described classifying documents;
Calculate the weights of described each character in described current classification according to the document frequency of the frequency of described character, described character and the sum of described classifying documents.
5. method according to claim 3 is characterized in that, specifies the weights of character in described current classification in the character string to comprise in described definite described current classification:
When in the appointment character string in the described current classification character that is not included in the described classifying documents being arranged, it is default weight that the described weights that are not included in the character in the described classifying documents are set.
6. each described method is characterized in that according to claim 1-5, and it is one of following that described character comprises: the character of the character of hanzi form, Korean form or the character of Japanese form.
7. target information searcher is characterized in that comprising such as lower module:
Receiver module be used for to receive the participle device of user selection and the character string of described user input, and wherein, described participle device be the participle device of the string matching inputted with described user;
Word-dividing mode, the described participle device that is used for using described receiver module to receive carries out participle to described character string, obtains search terms;
Search module is used for the described search terms inputted search engine that described word-dividing mode obtains is searched for, and obtains target information.
8. device according to claim 7 is characterized in that, described device also comprises:
The participle device is set up module, is used for using the classifying documents corresponding with technical field to set up participle device corresponding to described technical field.
9. device according to claim 8 is characterized in that, described participle device is set up module and comprised:
The document determining unit is used for technical field is classified, and determines classifying documents corresponding to current classification;
The character weight calculation unit, the frequency that each character of classifying documents that is used for determining according to described document determining unit occurs is calculated the weights of described each character in described current classification;
The weights determining unit is used for the weights of character in described current classification in definite described current classification appointment character string;
The character string weight calculation unit is used for calculating the weights of described designated character string in described current classification according to the weights of described each character of designated character string;
The participle device is set up the unit, is used for described designated character string and described designated character string are bound at the weights of described current classification, obtains the participle device of described current classification.
10. device according to claim 9 is characterized in that, described character weight calculation unit comprises:
The deletion subelement is for the stop-word of deleting described classifying documents;
The statistics subelement is used for adding up described deletion subelement and deletes the frequency that each character of described classifying documents behind the described stop-word occurs, and adds up the document frequency that comprises described character in the described classifying documents;
The character string computation subunit is used for calculating described each character at the weights of described current classification according to the document frequency of the frequency of described character, described character and the sum of described classifying documents.
CN201110207333.8A 2011-07-22 2011-07-22 Target information search method and device Active CN102890690B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201110207333.8A CN102890690B (en) 2011-07-22 2011-07-22 Target information search method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201110207333.8A CN102890690B (en) 2011-07-22 2011-07-22 Target information search method and device

Publications (2)

Publication Number Publication Date
CN102890690A true CN102890690A (en) 2013-01-23
CN102890690B CN102890690B (en) 2017-04-12

Family

ID=47534196

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201110207333.8A Active CN102890690B (en) 2011-07-22 2011-07-22 Target information search method and device

Country Status (1)

Country Link
CN (1) CN102890690B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106021625A (en) * 2016-07-26 2016-10-12 浪潮软件集团有限公司 Mixed application method of two word segmenters based on SOLR search engine
CN106708798A (en) * 2015-11-16 2017-05-24 阿里巴巴集团控股有限公司 String segmentation method and device
WO2018041036A1 (en) * 2016-08-29 2018-03-08 中兴通讯股份有限公司 Keyword searching method, apparatus and terminal
CN109063046A (en) * 2018-07-17 2018-12-21 广州资宝科技有限公司 searching method, device and intelligent terminal
CN109800326A (en) * 2019-01-24 2019-05-24 广州虎牙信息科技有限公司 A kind of method for processing video frequency, device, equipment and storage medium
CN111090668A (en) * 2019-12-09 2020-05-01 北京海益同展信息科技有限公司 Data retrieval method and device, electronic equipment and computer-readable storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101561818B (en) * 2009-05-13 2011-12-07 北京伟库电子商务科技有限公司 Method for word segmentation processing and method for full-text retrieval

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106708798A (en) * 2015-11-16 2017-05-24 阿里巴巴集团控股有限公司 String segmentation method and device
CN106708798B (en) * 2015-11-16 2020-03-31 阿里巴巴集团控股有限公司 Character string segmentation method and device
CN106021625A (en) * 2016-07-26 2016-10-12 浪潮软件集团有限公司 Mixed application method of two word segmenters based on SOLR search engine
WO2018041036A1 (en) * 2016-08-29 2018-03-08 中兴通讯股份有限公司 Keyword searching method, apparatus and terminal
CN109063046A (en) * 2018-07-17 2018-12-21 广州资宝科技有限公司 searching method, device and intelligent terminal
CN109800326A (en) * 2019-01-24 2019-05-24 广州虎牙信息科技有限公司 A kind of method for processing video frequency, device, equipment and storage medium
CN111090668A (en) * 2019-12-09 2020-05-01 北京海益同展信息科技有限公司 Data retrieval method and device, electronic equipment and computer-readable storage medium
CN111090668B (en) * 2019-12-09 2023-09-26 京东科技信息技术有限公司 Data retrieval method and device, electronic equipment and computer readable storage medium

Also Published As

Publication number Publication date
CN102890690B (en) 2017-04-12

Similar Documents

Publication Publication Date Title
CN108304444B (en) Information query method and device
CN103106282B (en) A kind of method of Webpage search and displaying
CN100545847C (en) A kind of method and system that blog articles is sorted
CN104199965B (en) Semantic information retrieval method
US8527487B2 (en) Method and system for automatic construction of information organization structure for related information browsing
CN100424695C (en) Document processing apparatus for searching documents control method therefor,
CN102890690A (en) Target information search method and device
US9971828B2 (en) Document tagging and retrieval using per-subject dictionaries including subject-determining-power scores for entries
CN106095738A (en) Recommendation tables single slice
CN102831131A (en) Method and device for establishing labeling webpage linguistic corpus
KR20140075428A (en) Method and system for semantic search keyword recommendation
CN115563313A (en) Knowledge graph-based document book semantic retrieval system
CN101661490B (en) Search engine, client thereof and method for searching page
KR101638535B1 (en) Method of detecting issue patten associated with user search word, server performing the same and storage medium storing the same
CN104462347B (en) The sorting technique and device of keyword
CN103942232A (en) Method and equipment for mining intentions
JP4882040B2 (en) Information processing apparatus, information processing system, and program
CN103942204B (en) For excavating the method and apparatus being intended to
CN109918420B (en) Competitor recommendation method and server
KR100943625B1 (en) Method and System for Generating Integrated Database for Integradedly Managing Local Information and Website Information and Method for Providing Search Result Using Integrated Database
CN112765311A (en) Method for searching referee document
KR20040098889A (en) A method of providing website searching service and a system thereof
CN112650869A (en) Image retrieval reordering method and device, electronic equipment and storage medium
CN111897928A (en) Chinese query expansion method for embedding expansion words into query words and counting expansion word union
KR100753779B1 (en) Method for executing initial sound letter search of mixed form and system for executing the method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant