Summary of the invention
In order to simplify search operation, improve user experience, the embodiment of the invention provides a kind of method of obtaining information, and said method comprises:
Obtain the key word of user's input;
According to the preset keyword matching condition, obtain first information collection with said key word content match;
Whether judge information content that the said first information concentrates greater than presupposed information quantity, and whether said first information collection comprise at least two semantic type, if, then obtain the information of said presupposed information quantity, said information comprises at least two semantic type;
Send said information to said user.
Saidly whether judge information content that the said first information concentrates, and whether said first information collection comprise at least two semantic type, specifically comprise greater than presupposed information quantity:
Obtain the information content that the said first information is concentrated, judge that whether said information content is greater than presupposed information quantity;
Information to the said first information is concentrated is carried out text cluster by semantic class;
Obtain semantic type the quantity that said first information collection comprises;
Judge that whether said semantic type quantity is more than or equal to two.
The said information of obtaining presupposed information quantity, said information comprise that at least two semantic type specifically comprises:
Semantic type the quantity that comprises when said first information collection is then obtained an information during less than said presupposed information quantity in information that each semantic class comprises, obtain the first temporary information collection;
Calculate the difference number of the quantity of said presupposed information quantity and said semantic class;
Concentrate remaining information to sort from high to low to the said first information by the matching degree of itself and said key word;
Obtain the information that ordering back information position number is less than or equal to said difference number, obtain the second temporary information collection, and said first temporary information collection and said second temporary information are gathered also, obtain the information of said presupposed information quantity;
Semantic type the quantity that comprises when said first information collection is then obtained an information during greater than said presupposed information quantity in information that each semantic class comprises, obtain the 4th temporary information collection;
The information that said the 4th temporary information is concentrated sorts by the matching degree of itself and said key word from high to low;
Obtain the information that ordering back the 4th temporary information concentrated message position number is less than or equal to said presupposed information quantity, obtain the information of said presupposed information quantity.
The said information of obtaining presupposed information quantity, said information comprise that at least two semantic type specifically comprises:
The information that the said first information is concentrated sorts by the matching degree of itself and said key word from low to high;
When said first information collection is SQ={sq
0, sq
1, sq
2,, sq
m, m is the information number that the said first information is concentrated;
Then according to rq
x=sq
yObtain the information of at least two semantic type presupposed information quantity;
Wherein,
A=log
NM, N are presupposed information quantity, rq
xFor pressing
Information after in SQ, obtaining.
Saidly said information sent to the user specifically comprise:
The matching degree of said information by itself and said key word sorted from high to low;
Information with after the ordering sends to the user in order successively.
The embodiment of the invention provides a kind of device that obtains information, and said device comprises:
The key word acquisition module is used to obtain the key word of user's input;
First information collection acquisition module is used for according to the preset keyword matching condition, obtains the first information collection with said key word content match;
The information acquisition module; Whether be used to judge information content that the said first information concentrates greater than presupposed information quantity, and whether said first information collection comprise at least two semantic type, if; Then obtain the information of said presupposed information quantity, said information comprises at least two semantic type;
Information sending module is used to send said information to said user.
Said information acquisition module specifically comprises:
Information content is confirmed the unit, is used to obtain the information content that the said first information is concentrated, and judges that whether said information content is greater than presupposed information quantity;
The text cluster unit, the information that is used for the said first information is concentrated is carried out text cluster by semantic class;
Semantic type number obtainment unit is used to obtain semantic type the quantity that said first information collection comprises;
Semantic type determining unit is used to judge that whether said semantic type quantity is more than or equal to two;
Information acquisition unit, the information content that is used for concentrating when the said first information be greater than presupposed information quantity, and said first information collection comprises at least two semantic time-likes, obtains the information of said presupposed information quantity, and said information comprises at least two semantic type.
Said information acquisition module specifically comprises:
Temporary information collection generation unit, semantic type the quantity that is used for comprising when said first information collection then obtain an information during less than said presupposed information quantity in information that each semantic class comprises, obtain the first temporary information collection;
The quantity difference is counted computing unit, is used to calculate the difference number of said presupposed information quantity and said semantic type quantity;
The presupposed information acquiring unit; Be used for concentrating remaining information to sort from high to low by the matching degree of itself and said key word to the said first information; Obtain the information that ordering back information position number is less than or equal to said difference number; Obtain the second temporary information collection, and said first temporary information collection and said second temporary information are gathered also, obtain the information of said presupposed information quantity;
First information acquiring unit; Semantic type the quantity that is used for comprising when said first information collection is during greater than said presupposed information quantity; Then in the information that each semantic class comprises, obtain an information; Obtain the 4th temporary information collection; The information that said the 4th temporary information is concentrated sorts by its matching degree with said key word from high to low, obtain ordering afterwards the 4th temporary information concentrated message position number be less than or equal to the information of said presupposed information quantity, obtain the information of said presupposed information quantity.
Said information acquisition module specifically comprises:
Second information acquisition unit, the information that is used for the said first information is concentrated sorts by the matching degree of itself and said key word from low to high, when said first information collection is SQ={sq
0, sq
1, sq
2,, sq
m, m is that information that the said first information is concentrated is when counting, then according to rq
x=sq
yObtain the information of at least two semantic type presupposed information quantity; Wherein,
A=log
NM, N are presupposed information quantity, rq
xFor pressing
Information after in SQ, obtaining.
Said information sending module specifically comprises:
The key word sequencing unit is used for the matching degree of said information by itself and said key word sorted from high to low;
Information transmitting unit is used for the information after the ordering is sent to the user in order successively.
The embodiment of the invention; With the information of the keyword matching of user input in, obtain at least two semantic type information, for the user provides the information with the key word correlation type; Thereby need not the user and re-enter the key word relevant with this key word; Can obtain relevant information, reduce user's operation, improve user experience.
Embodiment
For making the object of the invention, technical scheme and advantage clearer, embodiment of the present invention is done to describe in detail further below in conjunction with accompanying drawing.
Embodiment 1
As shown in Figure 1, the embodiment of the invention provides a kind of method of obtaining information, and said method comprises:
S101: the key word that obtains user's input;
S102:, obtain first information collection with said key word content match according to the preset keyword matching condition;
S103: when said first information collection comprises at least two semantic type, and the information content concentrated of the said first information obtains the information of at least two semantic type presupposed information quantity during greater than presupposed information quantity, and said information is sent to the user.
Need to prove that the executive agent of each step of the embodiment of the invention can be search server, also can be for having other executive agents of individual step function.
The embodiment of the invention; With the information of the keyword matching of user input in, obtain at least two semantic type information, thereby the information of the key word correlation type that provides with it be provided for the user; Thereby need not the user and re-enter the key word relevant with this key word; Can obtain relevant information, reduce user's operation, improve user experience.
Embodiment 2
As shown in Figure 2, the embodiment of the invention provides a kind of method of obtaining information, and said method comprises:
S201: the key word that obtains user's input;
Wherein, the key word of user's input can be problem, the inquiry of user search input or the existing problem that can reflect its information requirement that the user will browse that the user puts question to input.
For example, through obtaining the problem of user's input, get access to the problem q of user's input
i
S202:, obtain first information collection with said key word content match according to the preset keyword matching condition;
Optional, can carry out information retrieval technique through prior art, from the problem information database that existing question answering system was collected in the past and/or write down, retrieve all and customer problem q
iSemantic relevant problem.
For example, through in database to problem q
iRetrieve, obtain relevant issues Candidate Set SQ
i={ sq
0, sq
1, sq
2,, sq
m.
S203: obtain the information content that the said first information is concentrated, whether judge said information content greater than presupposed information quantity, if, then carry out S204, if not, then this problem is selected concentrated information as the information that returns to the user, promptly carry out S206;
Optional, relevant issues Candidate Set SQ in S202
i={ sq
0, sq
1, sq
2,, sq
m, the m value is 20, and presupposed information quantity is 10, promptly the information content concentrated of the first information is then carried out S204 greater than presupposed information quantity.
S204: said first information collection is carried out text cluster;
Wherein, text cluster mainly is that promptly similar document similarity is bigger according to the cluster hypothesis, and inhomogeneous document similarity is less.
Preferably, the result that search engine is returned carries out cluster, makes the user navigate to needed information rapidly.Concrete, import search key through the user, then the document that retrieves is carried out clustering processing, and export each different classes of concise and to the point description, thus the scope that can dwindle retrieval, the user only need pay close attention to theme more likely.This in addition method also can be given a clue for user's quadratic search.
Optional, the algorithm class that said first information collection is carried out text cluster comprises: partitioning (partitioning methods), stratification (hierarchical methods), based on the method (density-based methods) of density, based on the method (grid-based methods) of grid with based on the method (model-based methods) of model.
Wherein, partitioning (partitioning methods) is meant: a given data set that N tuple or record are arranged, disintegrating method will be constructed K grouping, each divide into groups just to represent a cluster, K<N.And this K following condition of dividing into groups to satisfy: each divides into groups to comprise at least a data record (1); (2) each data recording belongs to and only belongs to a grouping (note: this requirement can be relaxed) in some fuzzy clustering algorithm; For given K; Algorithm at first provides an initial group technology; Change through the method that iterates later on and divide into groups; Make that the group protocol after improving each time is all more preceding once good, and so-called good standard is exactly: the record in the same grouping is good more closely more, and the record in the different grouping is good more far more.Use the algorithm of this basic thought to have: K-MEANS algorithm, K-MEDOIDS algorithm, CLARANS algorithm.
Stratification (hierarchical methods) is meant: given data set is carried out decomposing like the level, till certain condition satisfies.Specifically can be divided into " bottom-up " and " top-down " two kinds of schemes again.For example in " bottom-up " scheme; Each data recording is all formed an independent group when initial; In ensuing iteration, it is merged into a group to those contiguous each other groups, forms one up to all records and divides into groups or till certain condition satisfies.Represent algorithm to have: BIRCH algorithm, CURE algorithm, CHAMELEON algorithm etc.
Method (density-based methods) based on density is meant: based on the method for density and a fundamental difference of other method be: it is not based on various distances, and is based on density.So just can overcome the shortcoming that to find the cluster of " similar round " based on the algorithm of distance.The guiding theory of this method is exactly that the density that needs only the point in the zone is bigger than certain threshold values, just is added to it in the close with it cluster and goes.Represent algorithm to have: DBSCAN algorithm, OPTICS algorithm, DENCLUE algorithm etc.
Method (grid-based methods) based on grid is meant: at first data space is divided into the network of limited unit (cell), all processing all are to be object with single unit.An outstanding advantage of so handling is exactly that processing speed is very fast, usually this be with target database in the number that writes down irrelevant, it is only with to be divided into data space what unit relevant.Represent algorithm to have: STING algorithm, CLIQUE algorithm, WAVE-CLUSTER algorithm.
Method (model-based methods) based on model is meant: give each cluster supposition a model based on the method for model, remove to seek the data set that can individual well satisfy this model then.Such model possibly be the density fonction of data point in the space or other.An its potential supposition is exactly: target data set is to be determined by a series of probability distribution.Usually there are two kinds to attempt direction: the scheme of statistics and the scheme of neural network.
Also can realize the data that the first information is concentrated are carried out cluster through other algorithms in this step, present embodiment limit.
S205: obtain semantic type the quantity that said first information collection comprises;
For example, to relevant issues Candidate Set SQ
i={ sq
0, sq
1, sq
2,, sq
m, the m value is that 20 first information Candidate Set carries out cluster by its semantic type, obtains 3 semantic type.
S206: whether judge said semantic type quantity more than or equal to two, if greater than, then obtain the information of presupposed information quantity, said information is at least two semantic type.
For example, shown in example among the S205, the class of languages of this first information collection is 3 types, greater than two semantic type, then obtains the information of presupposed information quantity, and said information is at least two semantic type.
S207: the matching degree of said information by itself and said key word sorted from high to low;
S208: the information after will sorting once sends to the user in order.
Need to prove that the executive agent of each step of the embodiment of the invention can be search server, also can be for having other executive agents of individual step function.
The embodiment of the invention; With the information of the keyword matching of user input in, obtain at least two semantic type information, thereby the information of the key word correlation type that provides with it be provided for the user; Thereby need not the user and re-enter the key word relevant with this key word; Can obtain relevant information, reduce user's operation, improve user experience.
Embodiment 3
As shown in Figure 3, the embodiment of the invention provides a kind of method of obtaining information, and said method comprises step S301~S310; Wherein S301~S305 is identical with S201~S205 among the embodiment 2; Repeat no more, different with embodiment 2 is here, further comprising the steps of in this enforcement:
S306: whether judge semantic type quantity that said first information collection comprises greater than said presupposed information quantity, if greater than, then in information that each semantic class comprises, obtain an information, obtain the first temporary information collection;
For example, relevant issues Candidate Set SQ
i={ sq
0, sq
1, sq
2,, sq
m, the m value is that semantic type the quantity that comprises in 20 is 3, presupposed information quantity is 10; Then in each semantic class, obtain an information, obtain 3 different information of semantic type here, form the first temporary information collection LQ1={lq1
0, lq1
1, lq1
2.
S307: the difference number that calculates the quantity of said presupposed information quantity and said semantic class;
For example, after S306 gets access to 3 information, then calculate the difference number of the quantity of said presupposed information quantity and said semantic class, promptly presupposed information quantity 10 deducts 3, and the difference number is 7.
S308: concentrate remaining information to sort from high to low by the matching degree of itself and said key word to the said first information;
For example, relevant issues Candidate Set SQ
i={ sq
0, sq
1, sq
2,, sq
m, the m value is in 20, goes out 3 information having obtained, also has 17 information, and the matching degree of these 17 information by itself and key word sorted from high to low.
S309: obtain the information that ordering back information position number is less than or equal to said difference number, obtain the second temporary information collection, and said first temporary information collection and said second temporary information are gathered also;
For example, the remaining information sequence number after the ordering is 1~17, then obtains the information that position number is equal to or less than difference several 7, and promptly message sequence number is 1~7 information, thereby obtains the second temporary information collection LQ2={lq2
0, lq2
1, lq2
2, lq2
3, lq2
4, lq2
5, and with the set of the first temporary information collection and second temporary information also.
S310: the information after will merging sends to the user.
For example, with the set of the first temporary information collection and second temporary information and obtain information lq1
0, lq1
1, lq1
2, lq2
0, lq2
1, lq2
2, lq2
3, lq2
4, lq2
5,, this information is sent to the user.
Preferably, also can sort from high to low to the matching degree of this information by itself and key word, the information with after the ordering sends to the user in order.
Need to prove that present embodiment is merely a kind of different semantic type information getting method that obtains; The concrete information that gets access to different semantic classes also can be passed through accomplished in many ways; To realize that the information that gets access to belongs to the scope that the different semantic type of methods that are purpose adopts all belong to the present embodiment protection, specifically repeats no more.
Need to prove that the executive agent of each step of the embodiment of the invention can be search server, also can be for having other executive agents of individual step function.
The embodiment of the invention; With the information of the keyword matching of user input in, obtain at least two semantic type information, thereby the information of the key word correlation type that provides with it be provided for the user; Thereby need not the user and re-enter the key word relevant with this key word; Can obtain relevant information, reduce user's operation, improve user experience.
Embodiment 4
As shown in Figure 4, the embodiment of the invention provides a kind of method of obtaining information, and said method comprises:
S401: the key word that obtains user's input;
Wherein, the key word of user's input can be problem, the inquiry of user search input or the existing problem that can reflect its information requirement that the user will browse that the user puts question to input.
For example, through obtaining the problem of user's input, get access to the problem q of user's input
i
S402:, obtain first information collection with said key word content match according to the preset keyword matching condition;
Optional, can carry out information retrieval technique through prior art, from the problem information database that existing question answering system was collected in the past and/or write down, retrieve all and customer problem q
iSemantic relevant problem.
S403: obtain the information content that the said first information is concentrated, whether judge said information content greater than presupposed information quantity, if greater than, then carry out S404, if less than, S405 then carried out;
In the present embodiment, optional, when said information content during greater than presupposed information quantity, the information that can concentrate the said first information is by its matching degree with said key word back execution S404 that sorts from low to high.
For example, carry out information retrieval technique, from the problem information database that existing question answering system was collected in the past and/or write down, retrieve all and customer problem q through prior art
iAfter the semantic relevant problem, according to them and problem q
iSimilarity sort and obtain relevant issues Candidate Set SQ
i={ sq
0, sq
1, sq
2,, sq
m.
S404: when said first information collection is SQ={sq
0, sq
1, sq
2,, sq
m, m is the information number that the said first information is concentrated; Then according to rq
x=sq
yObtain the information of at least two semantic type presupposed information quantity.
Wherein,
A=log
NM, N are presupposed information quantity, rq
xFor pressing
Information after in SQ, obtaining.
Concrete, from SQ
iN semantic relevant issues of progressively dispersing of middle taking-up.Make N
a=m, i.e. a=log
NM,, get function
Sq then
yBe x relevant issues rq
xThereby, the information set RQ after obtaining obtaining
i={ rq
1, rq
2, }.X is a Nonlinear Mapping of progressively dispersing to y, can guarantee preferential output sequence SQ like this
iIn with q
iMaximally related inquiry also can guarantee SQ
iThe semanteme of back is correlated with but divergence problem can output in the relevant issues.
Optional, also can be to SQ
iAfter sorting, structure mapping function y=f (x), (f (N)≤m) makes rq
x=sq
yThereby, obtain key issue RQ
i={ rq
1, rq
2,, rq
N.Various suitable mapping function f (x) all can be used to address this problem, like power function, exponential function etc.
S405: said information is sent to the user.
Optional, output problem q
iRelevant issues RQ
i={ rq
1, rq
2,, show each relevant issues at the problem browsing pages successively to the user.
Need to prove that the executive agent of each step of the embodiment of the invention can be search server, also can be for having other executive agents of individual step function.
The embodiment of the invention; With the information of the keyword matching of user input in, obtain at least two semantic type information, thereby the information of the key word correlation type that provides with it be provided for the user; Thereby need not the user and re-enter the key word relevant with this key word; Can obtain relevant information, reduce user's operation, improve user experience.
Embodiment 5
As shown in Figure 5, the embodiment of the invention provides a kind of device that obtains information, and said device comprises: key word acquisition module 501, first information collection acquisition module 502, information acquisition module 503 and information sending module 504, wherein:
Key word acquisition module 501 is used to obtain the key word of user's input;
First information collection acquisition module 502 is used for according to the preset keyword matching condition, obtains the first information collection with said key word content match;
Information acquisition module 503; Whether be used to judge information content that the said first information concentrates greater than presupposed information quantity, and whether said first information collection comprise at least two semantic type, if; Then obtain the information of said presupposed information quantity, said information comprises at least two semantic type;
Information sending module 504 is used to send said information to said user.
The embodiment of the invention; With the information of the keyword matching of user input in, obtain at least two semantic type information, thereby the information of the key word correlation type that provides with it be provided for the user; Thereby need not the user and re-enter the key word relevant with this key word; Can obtain relevant information, reduce user's operation, improve user experience.
Embodiment 6
As shown in Figure 6, the embodiment of the invention provides a kind of device that obtains information, and is similar with embodiment 5, and said device comprises key word acquisition module 501, first information collection acquisition module 502, information acquisition module 503 and information sending module 504.
Further said information acquisition module 503 specifically comprises:
Information content is confirmed unit 5031, is used to obtain the information content that the said first information is concentrated, and whether judges said information content greater than presupposed information quantity, if greater than, the information content that the then said first information is concentrated is greater than presupposed information quantity;
Text cluster unit 5032, the information that is used for the said first information is concentrated is carried out text cluster by semantic class;
Semantic type number obtainment unit 5033 is used to obtain semantic type the quantity that said first information collection comprises;
Whether semantic type determining unit 5034 is used to judge said semantic type quantity more than or equal to two, if greater than, then said first information collection comprises at least two semantic type.
Information acquisition unit 5035, the information content that is used for concentrating when the said first information be greater than presupposed information quantity, and said first information collection comprises at least two semantic time-likes, obtains the information of said presupposed information quantity, and said information comprises at least two semantic type.
Wherein, said information sending module 504 specifically comprises:
Key word sequencing unit 5041 is used for the matching degree of said information by itself and said key word sorted from high to low;
Information transmitting unit 5042 is used for the information after the ordering is sent to the user in order successively.
The embodiment of the invention; With the information of the keyword matching of user input in, obtain at least two semantic type information, thereby the information of the key word correlation type that provides with it be provided for the user; Thereby need not the user and re-enter the key word relevant with this key word; Can obtain relevant information, reduce user's operation, improve user experience.
Embodiment 7
As shown in Figure 7, the embodiment of the invention provides a kind of device that obtains information, and is similar with embodiment 6, and said device comprises: key word acquisition module 501, first information collection acquisition module 502, information acquisition module 503 and information sending module 504.Wherein, information sending module 504 comprises: key word sequencing unit 5041 and information transmitting unit 5042, and different with embodiment 6 is that in the present embodiment, information acquisition module 503 specifically comprises:
Temporary information collection generation unit 5036, semantic type the quantity that is used for comprising when said first information collection then obtain an information during less than said presupposed information quantity in information that each semantic class comprises, obtain the first temporary information collection;
The quantity difference is counted computing unit 5037, is used to calculate the difference number of said presupposed information quantity and said semantic type quantity;
Presupposed information acquiring unit 5038; Be used for concentrating remaining information to sort from high to low by the matching degree of itself and said key word to the said first information; Obtain the information that ordering back information position number is less than or equal to said difference number; Obtain the second temporary information collection, and said first temporary information collection and said second temporary information are gathered also, obtain the information of said presupposed information quantity;
First information acquiring unit 5039; Semantic type the quantity that is used for comprising when said first information collection is during greater than said presupposed information quantity; Then in the information that each semantic class comprises, obtain an information; Obtain the 4th temporary information collection; The information that said the 4th temporary information is concentrated sorts by its matching degree with said key word from high to low, obtain ordering afterwards the 4th temporary information concentrated message position number be less than or equal to the information of said presupposed information quantity, obtain the information of said presupposed information quantity.
The embodiment of the invention; With the information of the keyword matching of user input in, obtain at least two semantic type information, thereby the information of the key word correlation type that provides with it be provided for the user; Thereby need not the user and re-enter the key word relevant with this key word; Can obtain relevant information, reduce user's operation, improve user experience.
Embodiment 8
As shown in Figure 8, the embodiment of the invention provides a kind of device that obtains information, and is similar with embodiment 6, and said device comprises: key word acquisition module 501, first information collection acquisition module 502, information acquisition module 503 and information sending module 504.Wherein, information sending module 504 comprises: key word sequencing unit 5041 and information transmitting unit 5042, and different with embodiment 6 is that in the present embodiment, information acquisition module 503 specifically comprises:
Second
information acquisition unit 50310, the information that is used for the said first information is concentrated sorts by the matching degree of itself and said key word from low to high, when said first information collection is SQ={sq
0, sq
1, sq
2,, sq
m, m is that information that the said first information is concentrated is when counting, then according to rq
x=sq
yObtain the information of at least two semantic type presupposed information quantity; Wherein,
A=log
NM, N are presupposed information quantity, rq
xFor pressing
Information after in SQ, obtaining.
The embodiment of the invention; With the information of the keyword matching of user input in, obtain at least two semantic type information, thereby the information of the key word correlation type that provides with it be provided for the user; Thereby need not the user and re-enter the key word relevant with this key word; Can obtain relevant information, reduce user's operation, improve user experience.
All or part of content in the technical scheme that above embodiment provides can realize that through software programming its software program is stored in the storage medium that can read, storage medium for example: the hard disk in the computing machine, CD or floppy disk.
The above is merely preferred embodiment of the present invention, and is in order to restriction the present invention, not all within spirit of the present invention and principle, any modification of being done, is equal to replacement, improvement etc., all should be included within protection scope of the present invention.