CN102890690B - Target information search method and device - Google Patents
Target information search method and device Download PDFInfo
- Publication number
- CN102890690B CN102890690B CN201110207333.8A CN201110207333A CN102890690B CN 102890690 B CN102890690 B CN 102890690B CN 201110207333 A CN201110207333 A CN 201110207333A CN 102890690 B CN102890690 B CN 102890690B
- Authority
- CN
- China
- Prior art keywords
- character
- segmenter
- weights
- character string
- current class
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a target information search method and a target information search device. The method comprises the following steps of: receiving a word segmentation device selected by a user and a character string input by the user, wherein the word segmentation device is matched with the character string input by the user; performing word segmentation on the character string by using the word segmentation device to acquire a search word; and inputting the acquired search word into a search engine, and searching to acquire target information. By the method and the device, the problem of an inaccurate search result of the conventional search engine is solved, convenience is brought to a user, and retrieval quality is improved.
Description
Technical field
The present invention relates to information search field, in particular to a kind of target information search method and device.
Background technology
Search engine technique is being applied to more and more the data in various IT system, in search engine index storehouse
Thus exponentially increase, with Chinese character document being continuously increased in index database, increasing Chinese character word enters into rope
In drawing storehouse, the participle of all kinds of neologisms and special vocabulary (such as name or the term of specific area) after into participle storehouse to segmenter
Accuracy rate generates greatly negative effect so that many Chinese sentences cannot correctly be decomposed according to semanteme, such as Chinese sentence
Son:" ion cloud integrated distribution ", if not doing extra process to technical term " ion cloud ", then the Chinese sentence will be by participle
Device is decomposed into " ion cloud integrated distribution ", and such word segmentation result can cause search engine to search the desired money of user
Material.
It can be seen that, current way of search cannot also carry out participle according to the search target of user, cause word segmentation result and use
The retrieval purpose at family is not inconsistent;In addition, above-mentioned word segmentation result is not comprehensive enough so that cannot be defeated from user by some crucial search conditions
Extract in the character string for entering.
There are problems that Search Results are inaccurate for search engine in correlation technique, not yet propose effectively to solve at present
Scheme.
The content of the invention
Present invention is primarily targeted at a kind of target information search method and device are provided, at least to solve above-mentioned search
Engine has that Search Results are inaccurate.
According to an aspect of the invention, there is provided a kind of target information search method, comprises the steps:Receive user
The segmenter of selection and the character string of user input, wherein, the segmenter is the participle with the string matching of the user input
Device;Participle is carried out to above-mentioned character string using the segmenter, search terms are obtained;The search terms for obtaining are input into into search engine
Scan for, obtain target information.
Before the above-mentioned segmenter of receive user selection and the character string of user input, the method also includes:Using with skill
The corresponding segmenter in the corresponding classifying documents establishing techniques field in art field.
The corresponding segmenter in above-mentioned use classifying documents establishing techniques field corresponding with technical field includes:Technology is led
Domain is classified, and determines the corresponding classifying documents of current class;According to the frequency that each character in classifying documents occurs, calculate every
Weights of the individual character in current class;Determine the weights that the character in character string is specified in current class in current class;
Weights according to the weight computing designated character string of each character in designated character string in current class;By designated character string and
Weights binding of the designated character string in current class, obtains the segmenter of current class.
The above-mentioned frequency occurred according to each character in classifying documents, calculates weights bag of each character in current class
Include:Delete the stop-word in classifying documents;Statistics deletes the frequency that each character occurs in the classifying documents after stop-word;Statistics
Document frequency comprising character in classifying documents;The document frequency of frequency, character according to character and the sum meter of classifying documents
Calculate weights of each character in current class.
Weights of the character in character string in current class are specified to include in above-mentioned determination current class:Work as current class
In specify in character string when having the character being not included in classifying documents, setting is not included in the weights of the character in classifying documents
For default weight.
Above-mentioned character includes one below:The character of the character, the character of Korean form or Japanese form of hanzi form.
According to a further aspect in the invention, there is provided a kind of target information searcher, including such as lower module:Receive mould
Block, the segmenter selected for receive user and the character string of user input, wherein, segmenter is the character string with user input
The segmenter of matching;Word-dividing mode, the segmenter for being received using receiver module carries out participle to character string, obtains search word
Language;Search module, the search terms input search engine for word-dividing mode to be obtained is scanned for, and obtains target information.
Said apparatus also include:Segmenter sets up module, for setting up skill using classifying documents corresponding with technical field
The corresponding segmenter in art field.
Above-mentioned segmenter sets up module to be included:Document determining unit, for classifying to technical field, it is determined that current point
The corresponding classifying documents of class;Character weight calculation unit, for each word in the classifying documents that determined according to document determining unit
The frequency that symbol occurs, calculates weights of each character in current class;Weights determining unit, for determining current class middle finger
Determine weights of the character in character string in current class;Character string weight calculation unit, for according to every in designated character string
Weights of the weight computing designated character string of individual character in current class;Segmenter sets up unit, for by designated character string
Weights binding with designated character string in current class, obtains the segmenter of current class.
Above-mentioned character weight calculation unit includes:Delete subelement, for deleting classifying documents in stop-word;Statistics
Unit, deletes subelement and deletes the frequency that each character occurs in the classifying documents after stop-word, and statistical for statistics
Document frequency comprising character in class document;Character string computation subunit, for according to the document frequency of the frequency of character, character
Weights of each character in current class are calculated with the sum of classifying documents.
By the present invention, participle is carried out with the segmenter of the string matching of user input using using, can be from user
Extract each word in the character string of input exactly, scanned for using the word after participle, the target information for obtaining will
The expectation of user can be met, existing search engine is solved and be there are problems that Search Results are inaccurate, it is convenient for users, carry
The high quality of retrieval.
Description of the drawings
Accompanying drawing described herein is used for providing a further understanding of the present invention, constitutes the part of the application, this
Bright schematic description and description does not constitute inappropriate limitation of the present invention for explaining the present invention.In the accompanying drawings:
Fig. 1 is the flow chart of according to embodiments of the present invention 1 target information search method;
Fig. 2 is the structured flowchart of according to embodiments of the present invention 2 target information searcher;
Fig. 3 is the concrete structure block diagram of according to embodiments of the present invention 2 target information searcher;
Fig. 4 is the concrete structure block diagram of according to embodiments of the present invention 2 target information searcher;
Fig. 5 is the structured flowchart of according to embodiments of the present invention 2 weights generation module;
Fig. 6 is the flow chart of the target information search method of according to embodiments of the present invention 2 application Fig. 4 shown devices;
Fig. 7 is the flow chart of the target information search method of according to embodiments of the present invention 2 application Fig. 4 shown devices;
Fig. 8 is the flow chart of the target information search method of according to embodiments of the present invention 2 application Fig. 4 shown devices;
Fig. 9 is according to embodiments of the present invention 2 target information search system schematic diagram.
Specific embodiment
Below with reference to accompanying drawing and in conjunction with the embodiments describing the present invention in detail.It should be noted that not conflicting
In the case of, the feature in embodiment and embodiment in the application can be mutually combined.
The embodiment of the present invention considers that up till now search engine does not enter line retrieval according to technical field to retrieval information, causes to search
Hitch fruit is inaccurate, there is provided a kind of target information search method and device, which can make search engine in different field
In to different classifications using different participle models, the accuracy of participle can be improved;Suitable for searching engine field, participle field
In the field such as WEB application system.
Embodiment 1
A kind of target information search method is present embodiments provided, referring to Fig. 1, the method comprises the steps:
Step S102, segmenter and the character string of user input that receive user is selected, wherein, the segmenter is and user
The segmenter of the string matching of input;
The matching refers to that the corresponding technical field of the segmenter is consistent with the corresponding technical field of the character string of user input;
Step S104, participle is carried out using above-mentioned segmenter to the character string, obtains search terms;
Step S106, the search terms for obtaining input search engine is scanned for, and obtains target information.
The present embodiment carries out participle by using with the segmenter of the string matching of user input, can be from user input
Character string in extract each word exactly, scanned for using the word after participle, the target information for obtaining will be accorded with
The expectation at family is shared, existing search engine is solved and be there are problems that Search Results are inaccurate, it is convenient for users, improve
The quality of retrieval.
In order to improve the accuracy of participle, in the segmenter and the character of user input of the selection of above-mentioned segmenter receive user
Before string, the method also includes:Using the corresponding segmenter in classifying documents establishing techniques field corresponding with technical field.
Wherein, following steps are included using the corresponding segmenter in classifying documents establishing techniques field corresponding with technical field
Suddenly:
1) technical field is classified, determines the corresponding classifying documents of current class;
2) frequency occurred according to each character in the classifying documents, calculates weights of each character in current class;
3) weights in specifying the character in character string to classify in this prior in current class are determined;
4) according to the weight computing of each character in the designated character string weights of the designated character string in current class;
5) the weights binding by designated character string and the designated character string in current class, obtains the participle of current class
Device.
The concrete calculation of weights of each character in current class can be adopted:Delete the stopping in classifying documents
Word;Statistics deletes the frequency that each character occurs in the classifying documents after stop-word;Text comprising character in statistical classification document
Shelves frequency;The document frequency of frequency, character and the sum of classifying documents according to character calculates each character in current class
Weights.Certainly, in actual use, it is also possible to do not delete the stop-word in classifying documents, it is each in direct statistical classification document
The frequency that individual character occurs.Wherein, the stop-word can be previously set, for example:Article, conjunction or auxiliary word etc..
When having the character being not included in classifying documents during character string is specified in the current class, this is set and is not included in
The weights of the character in classifying documents are default weight.
Above-mentioned character includes one below:The character of the character, the character of Korean form or Japanese form of hanzi form.
After establishing the corresponding segmenter of each technical field, the segmenter professional by comparison is obtained, these segmenter can
To be displayed on the interface of search engine, select for user.By taking chinese character as an example, the searching method of target information is including as follows
Step:
Step 1, the document to including in classification do Chinese character frequency analysis.
Step 2, the Chinese character frequency to including in classification do probability distribution process, calculate the Chinese character included in classification and are dividing
The weights of apoplexy due to endogenous wind.
Step 3, the weight computing included in classification according to the Chinese character included in classification go out each word in segmenter dictionary
Weights of the language in classification.
In step 4, the weights input segmenter by each word in segmenter dictionary in classification, segmenter is set to become point
The special segmenter of class.
Step 5, the special segmenter for having built up the multiple classification for completing is supplied to user, user is from multiple special points
Select one to be best suitable for the special segmenter of its retrieval purpose in word device, and participle is provided for search engine using special segmenter
Service.
Step 6, user input search condition, special segmenter carries out word segmentation processing to search condition, and exports participle knot
Really, word segmentation result is carried out full-text search by search engine, and retrieval result is returned to into user.
User selects to search for segmenter that target most matches with it and be input into chinese character in the WEB page of the Internet
String, the system carries out word segmentation processing to Chinese character string by the segmenter that user specifies, and output best suits user's search purpose
Chinese-character words, and transfer to search engine to process Chinese-character words.
The present embodiment can provide special segmenter for the classification of each in document library, by taking Chinese character as an example, by classification
The occurrence number of the Chinese character in document does probability statistics, calculates weights of each Chinese character in classification, and according to Chinese character weights
Weights of each Chinese-character words in classification in segmenter dictionary are calculated, and then special segmenter is set up for each classification, used
Family searches for purpose and selects to be best suitable for the special segmenter of its search purpose in segmenter selection interface according to it, and using specialty
Segmenter obtains the optimal word segmentation result that purpose is searched for for user, so as to improve the search accuracy rate of search engine, improves and uses
Satisfaction of the family to search engine.
Embodiment 2
The present embodiment additionally provides a kind of target information searcher, and referring to Fig. 2, the device is included with lower module:
Receiver module 22, the segmenter selected for receive user and the character string of user input, wherein, the segmenter is
With the segmenter of the string matching of the user input;
Word-dividing mode 24, is connected with receiver module 22, and the segmenter for being received using receiver module 22 is entered to character string
Row participle, obtains search terms;
Search module 26, is connected with word-dividing mode 24, and the search terms input search for word-dividing mode 24 to be obtained is drawn
Hold up and scan for, obtain target information.
The present embodiment carries out participle by using with the segmenter of the string matching of user input, can be from user input
Character string in extract each word exactly, scanned for using the word after participle, the target information for obtaining will be accorded with
The expectation at family is shared, existing search engine is solved and be there are problems that Search Results are inaccurate, it is convenient for users, improve
The quality of retrieval.
In order to improve the accuracy of participle, referring to Fig. 3, said apparatus also include:Segmenter sets up module 32, with reception mould
Block 22 is connected, for using the corresponding segmenter in classifying documents establishing techniques field corresponding with technical field.
Wherein, segmenter is set up module 32 and is included:Document determining unit, for classifying to technical field, it is determined that working as
Front corresponding classifying documents of classifying;Character weight calculation unit, for every in the classifying documents that determined according to document determining unit
The frequency that individual character occurs, calculates weights of each character in current class;Weights determining unit, for determining current class
In specify weights of the character in current class in character string;Character string weight calculation unit, for according to designated character string
In each character weights of the weight computing designated character string in current class;Segmenter sets up unit, for by designated word
The weights binding of symbol string and designated character string in current class, obtains the segmenter of current class.
Preferably, above-mentioned character weight calculation unit includes:Delete subelement, for deleting classifying documents in stopping
Word;Statistics subelement, deletes subelement and deletes the frequency that each character occurs in the classifying documents after stop-word for statistics, with
And the document frequency comprising character in statistical classification document;Character string computation subunit, for according to the frequency of character, character
The sum of document frequency and classifying documents calculates weights of each character in current class.
The device that the present embodiment is provided, can be that special segmenter is set up in each classification in classifying documents storehouse, and user can be from
A segmenter for being best suitable for its inquiry target is selected in numerous special segmenter of classifying, is carried for search engine by the segmenter
Word segmentation result for being best suitable for its inquiry target, so as to improve the search precision of search engine.
By taking Chinese character as an example, the present embodiment additionally provides another kind of target information searcher to above-mentioned character, and the device includes
Such as lower module:
(1) Chinese character frequency collection module, (2) Chinese character weight computing module, (3) Chinese-character words weights generation module, (4) are special
With segmenter, (5) segmenter selecting module, and (6) retrieval request pretreatment module;Wherein, the function of modules is as follows:
Chinese character frequency collection module, calculates the frequency of occurrences of each Chinese character in classification in each classification.
Chinese character weight computing module, with each the Chinese character frequency of occurrences in classification as foundation, calculates each Chinese in classification
Probability of occurrence of the word in classification, and frequency is normalized, draw weights of the Chinese character in classification.
The Chinese character weight computing module can be calculated in classification according to the frequency of occurrences of all Chinese characters included in classification
Comprising weights of all Chinese characters in classification.
Wherein, Chinese character frequency collection module and Chinese character weight computing module are equivalent to above-mentioned character weight calculation unit.Should
Chinese character frequency collection module can collect the frequency of occurrences of all Chinese characters included in classification.
Chinese-character words weights generation module, is the word meter in segmenter dictionary with the Chinese character weights in classification as foundation
Calculate the weights in classification.
The Chinese-character words weights generation module can be according to all Chinese characters included in classification in classification weight computing
Go out weights of the Chinese-character words in classification in participle storehouse.
Special segmenter, is that a general segmenter is set up in classification, and the weights of all Chinese-character words of classification are incoming
In general segmenter, general segmenter is set to become the special segmenter of classification, special segmenter is with segmenter dictionary and classification
The weights of all Chinese-character words are used as participle foundation.
It can be seen that, the special segmenter of the present embodiment is built upon on the basis of general segmenter, by defeated to general segmenter
Enter the weights of all Chinese-character words of classification, general segmenter be changed into into the special segmenter of classification, special segmenter with point
The weights of word device dictionary and all Chinese-character words of classification are used as participle foundation.
Segmenter selecting module, by the proprietary segmenter of multiple classification having built up user is showed, and user is from multiple points
One is selected in the special segmenter of class, for search engine Chinese Word Segmentation Service is provided.
User can select to search for the special segmenter that purpose is most matched with it by the segmenter selecting module.
Retrieval request pretreatment module, the Chinese character string of receiving user's input selectes Chinese character string input user
Special segmenter, the special segmenter selected from user obtains word segmentation result, and it is defeated that word segmentation result is assembled into into querying condition
In entering search engine.
By taking Chinese character input as an example, a kind of target information searcher is present embodiments provided, the device can be arranged on and search
In rope engine server 40, referring to Fig. 4, the device is made up of following several modules:
(1) weights generation module 41;
(2) special segmenter 42, is connected with weights generation module 41;
(3) segmenter selecting module 43, is connected with special segmenter 42;
(4) retrieval request pretreatment module 44, is connected with segmenter selecting module 43 and network;
(5) search engine 45, are connected with retrieval request pretreatment module 44;
Wherein, weights generation module 41 is responsible for generating weights of the word included in the classification in the classification, ginseng
See Fig. 5, the module includes three submodules:
1st, Chinese character frequency collection module 411:The module removes first the stop-word in document, then statistical classification document library
In the frequency of occurrences (Chinese character in the occurrence number/classification of the individual Chinese character included in Chinese character frequency=classification of Chinese character that includes
Total number of word), while the number of files (hereinafter referred to as document frequency) comprising Chinese character in statistical classification.
2nd, Chinese character weight computing module 412:The Chinese character frequency that the module is calculated first according to Chinese character frequency collection module 411
Rate, the total number of documents in document frequency and classification calculates weights of the Chinese character in classification;Secondly to be present in participle storehouse but not
The Chinese character being present in classification gives default weight.
3rd, Chinese-character words weights generation module 413:Chinese-character words in segmenter dictionary are taken out one by one, and according to Chinese character
Word obtains the weights of Chinese character that the Chinese-character words that Chinese character weight computing module 412 calculates include in classification, then basis
The weight computing of Chinese character goes out weights of the Chinese-character words in classification in Chinese-character words, finally the weights by Chinese-character words in classification
Write hard disk.
Special segmenter 42 is responsible for providing the user the Chinese Word Segmentation Service of specialty, and special segmenter 42 can be by the inspection of user input
Rope condition is decomposed into and best suits the desired Chinese-character words of user, and the implementation process of the module is as follows:Initially set up one common point
Word device, then reads in the corresponding Chinese-character words weights of classification that Chinese-character words weights generation module 43 is calculated from hard disk, and
Chinese-character words weights are bound together with the Chinese-character words in participle storehouse, finally special segmenter segmenter is registered to into and is selected
In module 43;During participle, go out to best suit the Chinese-character words combination of classification according to Chinese-character words weight computing.
Segmenter selecting module 43 is responsible for for the special segmenter 42 for establishing showing user in visual mode, and
User is allowed to select a special segmenter 42 for best suiting retrieval purpose by the module, the implementation process of the module is as follows:
The special segmenter 42 for establishing is saved among chained list first, then provides user circle by segmenter selecting module 43
Face, in the user interface shows the special segmenter 42 in chained list, selects for user.User can only select in chained list it
In a special segmenter 42, user select finish after, the special segmenter 42 that segmenter selecting module 43 selectes user
Pass to retrieval request pretreatment module 44.
Retrieval request pretreatment module 44 is responsible for the search condition of receiving user's input, calls the special participle that user selectes
Device 42 carries out word segmentation processing, and word segmentation result is passed to into search engine:The receive user of retrieval request pretreatment module 44 first
Retrieval request, then retrieval request pretreatment module 44 retrieval request is delivered to into user and is selected by segmenter selecting module 43
Word segmentation processing is carried out in fixed special segmenter 42, and word segmentation result is fetched from special segmenter 42, last retrieval request is pre-
Word segmentation result is passed to search engine by processing module 44.
Based on the device that Figure 4 and 5 are provided, the present embodiment additionally provides a kind of target information search method, shown in Figure 6
Target information search method flow chart, the method comprises the following steps:
Step S601, scans classifying documents;
Step S602, counts the frequency of occurrences of the Chinese character in the classification;
Step S603, counts weights of the Chinese character in the classification;
Step S604, counts weights of the Chinese-character words in the classification;
Step S605, generates special segmenter;
Step S606, special segmenter is registered in segmenter selecting module;
Step S607, judges whether user selects segmenter;If it is, execution step S608;If not, execution step
S609;
Step S608, by the segmenter that user selects retrieval request pretreatment module is delivered to;
Step S609, waits user to select segmenter;
Step S610, judges whether user is input into search condition (or claim retrieval request, equivalent to above-mentioned character string);Such as
It is really, execution step S611;If not, execution step S612;
Step S611, calls the segmenter that user selects to carry out word segmentation processing to retrieval request, and using result as
Querying condition passes to search engine, then execution step S613;
Step S612, waits user input retrieval request;
Step S613, returns retrieval result to client.
Referring to Fig. 7 target information search method flow charts, the method is comprised the following steps:
Step S700:Chinese character frequency collection module 411 scans classifying documents;
Step S701:Chinese character frequency collection module 411 removes the stop-word in document;
Step S702:The frequency of occurrences (the Chinese character of the Chinese character included in the statistical classification document library of Chinese character frequency collection module 411
The total number of word of Chinese character in the occurrence number/classification of the individual Chinese character included in frequency=classification);
Step S703:In the statistical classification of Chinese character frequency collection module 411 comprising Chinese character number of files (hereinafter referred to as document frequency
Rate);
Step S704:The Chinese character frequency that Chinese character weight computing module 412 is calculated according to Chinese character frequency collection module 411,
Total number of documents in document frequency and classification calculates weights of the Chinese character in classification;
Step S705:Chinese character weight computing module 412 is that the Chinese character for being present in participle storehouse but being not present in classifying is assigned
Give default weight;
Step S706:Chinese-character words weights generation module 413 is weighed according to the Chinese character that Chinese character weight computing module 412 is calculated
It is worth and gives weights for the Chinese-character words comprising Chinese character in segmenter dictionary;
Step S707:Set up a common segmenter;
Step S708:The corresponding Chinese words of classification that Chinese-character words weights generation module 413 is calculated are read in from hard disk
Language weights, and Chinese-character words weights are bound together with the Chinese-character words in participle storehouse;
Step S709:Participle storehouse with Chinese character weights is injected in common segmenter, special segmenter 42 is made;
Step S710:Special segmenter is registered in segmenter selecting module;
Step S711:Judge whether that each classification establishes special segmenter, if not, repeat step S700 to S710
Till the special segmenter 42 of all class libraries is all set up and is completed;If it is, terminating.
Target information search method flow chart shown in Fig. 8, the method is comprised the following steps:
Step S800:Segmenter selecting module 43 is shown to special segmenter 42 in user interface.
Step S801:Segmenter selecting module 43 waits user to select segmenter 42.
Step S802:Segmenter selecting module 43 receives the special segmenter 42 of user's selection, and is recorded.
Step S803:Segmenter selecting module 43 sends the segmenter that user selectes to retrieval request pretreatment module
44。
Step S804:Retrieval request pretreatment module 44 receives the retrieval request of user, and calls the participle that user selectes
Device carries out word segmentation processing to retrieval request, and passes to search engine 45 using result as querying condition.
Step S805:Search engine 45 enters line retrieval according to the search condition after word segmentation processing, and returns retrieval result.
Step S806:Whether user reselects special segmenter 42, if it is, repeating step S802;If not,
Execution step S807.
Step S807:Whether user re-enters retrieval request, if it is, step S804 and step S805 are re-executed,
If not, terminating, if that is, user does not have new activity, business processing flow terminates automatically.
The present embodiment can mark off multiple classification according to technical field, and target information search system as shown in Figure 9 is shown
It is intended to, each classification respectively corresponds to a set of said apparatus, wherein, the segmenter selecting module in the device, retrieval request are pre-
Processing module and search engine are utility module.
The present embodiment can provide special segmenter for the classification of each in document library, by taking Chinese character as an example, by classification
The occurrence number of the Chinese character in document does probability statistics, calculates weights of each Chinese character in classification, and according to Chinese character weights
Weights of each Chinese-character words in classification in segmenter dictionary are calculated, and then special segmenter is set up for each classification, used
Family searches for purpose and selects to be best suitable for the special segmenter of its search purpose in segmenter selection interface according to it, and using specialty
Segmenter obtains the optimal word segmentation result that purpose is searched for for user, so as to improve the search accuracy rate of search engine, improves and uses
Satisfaction of the family to search engine.
As can be seen from the above description, present invention achieves following technique effect:
1st, diversified special segmenter is provided the user, user searches for special point that purpose is best suited by using with it
Word device, can effectively improve the accuracy of participle, and improve the retrieval accuracy of search engine on this basis.
2nd, user can select multiple segmenter that multiple word segmentation processing is carried out to same search condition, and each participle is tied
Fruit is individually submitted to search engine and retrieves, so as to accurately retrieve the desired document of user.
Obviously, those skilled in the art should be understood that above-mentioned each module of the invention or each step can be with general
Computing device realizing, they can be concentrated on single computing device, or are distributed in multiple computing devices and are constituted
Network on, alternatively, they can be realized with the executable program code of computing device, it is thus possible to they are stored
Performed by computing device in the storage device, and in some cases, can be shown to perform different from order herein
The step of going out or describe, or they are fabricated to respectively each integrated circuit modules, or by the multiple modules in them or
Step is fabricated to single integrated circuit module to realize.So, the present invention is not restricted to any specific hardware and software combination.
The preferred embodiments of the present invention are the foregoing is only, the present invention is not limited to, for the skill of this area
For art personnel, the present invention can have various modifications and variations.It is all within the spirit and principles in the present invention, made any repair
Change, equivalent, improvement etc., should be included within the scope of the present invention.
Claims (6)
1. a kind of target information search method, it is characterised in that comprise the steps:
Segmenter and the character string of the user input that receive user is selected, wherein, the segmenter is defeated with the user
The segmenter of the string matching for entering;
Participle is carried out to the character string using the segmenter, search terms are obtained;
The search terms for obtaining input search engine is scanned for, target information is obtained;
Wherein, before receiving the segmenter of user's selection and the character string of the user input, methods described also includes:Make
The corresponding segmenter of the technical field is set up with classifying documents corresponding with technical field, including:Technical field is carried out point
Class, determines the corresponding classifying documents of current class;According to the frequency that each character in the classifying documents occurs, calculate described every
Weights of the individual character in the current class;Determine and the character in character string is specified in the current class at described current point
The weights of apoplexy due to endogenous wind;Designated character string is in the current class according to the weight computing of each character in the designated character string
In weights;Weights binding by the designated character string and the designated character string in the current class, obtains described
The segmenter of current class.
2. method according to claim 1, it is characterised in that described occur according to each character in the classifying documents
Frequency, calculating the weights of each character in the current class includes:
Delete the stop-word in the classifying documents;
Statistics deletes the frequency that each character occurs in the classifying documents after the stop-word;
Count the document frequency comprising the character in the classifying documents;
Each character described in the sum calculating of the frequency, the document frequency of the character and the classifying documents according to the character
Weights in the current class.
3. method according to claim 1, it is characterised in that specify in character string in the determination current class
Weights of the character in the current class include:
When having the character being not included in the classifying documents during character string is specified in the current class, do not wrap described in setting
The weights of the character being contained in the classifying documents are default weight.
4. the method according to any one of claim 1-3, it is characterised in that the character includes one below:Chinese characters form
The character of the character, the character of Korean form or Japanese form of formula.
5. a kind of target information searcher, it is characterised in that include such as lower module:
Receiver module, the segmenter selected for receive user and the character string of the user input, wherein, the segmenter is
With the segmenter of the string matching of the user input;
Word-dividing mode, the segmenter for being received using the receiver module carries out participle to the character string, is searched
Rope word;
Search module, the search terms input search engine for the word-dividing mode to be obtained is scanned for, and obtains mesh
Mark information;
Wherein, described device also includes:Segmenter sets up module, for setting up institute using classifying documents corresponding with technical field
The corresponding segmenter of technical field is stated, the segmenter sets up module to be included:Document determining unit, for carrying out to technical field
Classification, determines the corresponding classifying documents of current class;Character weight calculation unit, for being determined according to the document determining unit
Classifying documents in the frequency that occurs of each character, calculate the weights of each character in the current class;Weights are true
Order unit, for determining the current class in specify weights of the character in the current class in character string;Character string
Weight calculation unit, works as the designated character string according to the weight computing of each character in the designated character string described
Weights in front classification;Segmenter sets up unit, for by the designated character string and the designated character string described current
Weights binding in classification, obtains the segmenter of the current class.
6. device according to claim 5, it is characterised in that the character weight calculation unit includes:
Delete subelement, for deleting the classifying documents in stop-word;
Statistics subelement, for counting the deletion subelement each character in the classifying documents after the stop-word is deleted
The frequency of appearance, and count the document frequency comprising the character in the classifying documents;
Character string computation subunit, for according to the frequency of the character, the document frequency of the character and the classifying documents
Sum calculate the weights of each character in the current class.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201110207333.8A CN102890690B (en) | 2011-07-22 | 2011-07-22 | Target information search method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201110207333.8A CN102890690B (en) | 2011-07-22 | 2011-07-22 | Target information search method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN102890690A CN102890690A (en) | 2013-01-23 |
CN102890690B true CN102890690B (en) | 2017-04-12 |
Family
ID=47534196
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201110207333.8A Active CN102890690B (en) | 2011-07-22 | 2011-07-22 | Target information search method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102890690B (en) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106708798B (en) * | 2015-11-16 | 2020-03-31 | 阿里巴巴集团控股有限公司 | Character string segmentation method and device |
CN106021625A (en) * | 2016-07-26 | 2016-10-12 | 浪潮软件集团有限公司 | Mixed application method of two word segmenters based on SOLR search engine |
CN107798004B (en) * | 2016-08-29 | 2022-09-30 | 中兴通讯股份有限公司 | Keyword searching method and device and terminal |
CN109063046A (en) * | 2018-07-17 | 2018-12-21 | 广州资宝科技有限公司 | searching method, device and intelligent terminal |
CN109800326B (en) * | 2019-01-24 | 2021-07-02 | 广州虎牙信息科技有限公司 | Video processing method, device, equipment and storage medium |
CN111090668B (en) * | 2019-12-09 | 2023-09-26 | 京东科技信息技术有限公司 | Data retrieval method and device, electronic equipment and computer readable storage medium |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101561818A (en) * | 2009-05-13 | 2009-10-21 | 北京用友移动商务科技有限公司 | Method for word segmentation processing and method for full-text retrieval |
-
2011
- 2011-07-22 CN CN201110207333.8A patent/CN102890690B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101561818A (en) * | 2009-05-13 | 2009-10-21 | 北京用友移动商务科技有限公司 | Method for word segmentation processing and method for full-text retrieval |
Non-Patent Citations (2)
Title |
---|
The Research of Chinese Automatic Word Segmentation In Hierarchical Model Dictionary Binary Tree;Luo XianGang等;《2009 First International Workshop on Database Technology and Applications》;20090426;第321-324页 * |
适用于化工专业搜索引擎的中文分词系统的研究与实现;王硕;《中国优秀硕士学位论文全文数据库信息科技辑》;20081115(第11期);第1-52页 * |
Also Published As
Publication number | Publication date |
---|---|
CN102890690A (en) | 2013-01-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102890690B (en) | Target information search method and device | |
CN100545847C (en) | A kind of method and system that blog articles is sorted | |
CN102043833B (en) | Search method and device based on query word | |
CN103577416B (en) | Expanding query method and system | |
JP4006239B2 (en) | Document search method and search system | |
CN108304444B (en) | Information query method and device | |
CN103838754B (en) | Information retrieval device and method | |
CN104199965B (en) | Semantic information retrieval method | |
CN104077407B (en) | A kind of intelligent data search system and method | |
US20080201297A1 (en) | Method and System for Determining Relation Between Search Terms in the Internet Search System | |
CN107729336A (en) | Data processing method, equipment and system | |
CN102004782A (en) | Search result sequencing method and search result sequencer | |
CN105653562A (en) | Calculation method and apparatus for correlation between text content and query request | |
CN103902597A (en) | Method and device for determining search relevant categories corresponding to target keywords | |
CN106202313B (en) | Search result towards academic Meta Search Engine synthesizes sort method | |
CN103942198B (en) | For excavating the method and apparatus being intended to | |
CN112035599A (en) | Query method and device based on vertical search, computer equipment and storage medium | |
CN106095738A (en) | Recommendation tables single slice | |
CN107085568A (en) | A kind of text similarity method of discrimination and device | |
CN104462347B (en) | The sorting technique and device of keyword | |
CN103226601B (en) | A kind of method and apparatus of picture searching | |
CN106021423B (en) | META Search Engine personalization results recommended method based on group division | |
JP2013054606A (en) | Document retrieval device, method and program | |
CN105975508B (en) | Personalized meta search engine search result synthesizes sort method | |
CN109918420B (en) | Competitor recommendation method and server |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |