CN102750277A - Method and device for obtaining information - Google Patents

Method and device for obtaining information Download PDF

Info

Publication number
CN102750277A
CN102750277A CN2011100964639A CN201110096463A CN102750277A CN 102750277 A CN102750277 A CN 102750277A CN 2011100964639 A CN2011100964639 A CN 2011100964639A CN 201110096463 A CN201110096463 A CN 201110096463A CN 102750277 A CN102750277 A CN 102750277A
Authority
CN
China
Prior art keywords
information
presupposed
collection
key word
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2011100964639A
Other languages
Chinese (zh)
Other versions
CN102750277B (en
Inventor
焦峰
李亚楠
杨月奎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Shiji Guangsu Information Technology Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201110096463.9A priority Critical patent/CN102750277B/en
Priority claimed from CN201110096463.9A external-priority patent/CN102750277B/en
Publication of CN102750277A publication Critical patent/CN102750277A/en
Application granted granted Critical
Publication of CN102750277B publication Critical patent/CN102750277B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses a method for obtaining information, which comprises the following steps of: obtaining key words input by a user; according to the preset key word matching conditions, obtaining a first information set matched with the key word content; judging whether the number of information in the first information set is greater than the number of the preset information or not, and whether the first information set comprises at least two semantic categories or not, if so, obtaining the preset number of information, and the information comprising at least two semantic categories; and sending the information to the user. According to the embodiment of the invention, at least two semantic categories of information is obtained in the information matched with the key words input by the user, so that the keyword-related types of information is provided for the user, and thus the related information can be obtained without re-inputting the key words related to the key words, the operation of the user is reduced, and the user experience is improved.

Description

Obtain the method and apparatus of information
Technical field
The present invention relates to communication technical field, particularly a kind of method and apparatus that obtains information.
Background technology
Question answering system is a kind of common instrument that the Internet user obtains information, and for example Baidu knows, searches and ask etc.In order to satisfy user's information browse demand, question answering system can be retrieved and push other problems or the answer relevant with the current browsing problem, is referred to as " relevant issues " here.Relevant issues can further satisfy user's the demand of browsing.Yet owing to limited by spacial flex, the relevant issues of a problem often can only show and many times can't all relevant issues be showed about 5, therefore need certain method to choose the most representative several relevant issues.
Existing relevant issues searching system is chosen and semantic immediate several the problems of current browsing problem, and shows the user successively.Its technology is embodied as: at first, obtain the problem Q that the user clicks or imports; Then, utilize information retrieval or natural language processing technique, from the issue database of collecting in the past or writing down, retrieve the problem set R (Q) relevant with problem Q; Then, relevant issues among the R (Q) are sorted according to the semantic relevancy with Q; At last, choosing the middle the highest N bar relevant issues of rank of R (Q) shows.Wherein, N is the maximal value of relevant issues show bar number on the page.
The prior art scheme provides the problem that proposes with user relevant information in terms of content for the user.But the relevant issues Search Results that provides through existing technical scheme; All be semantic identical or very approaching, when the user hopes loosely to browse the otherwise relevant knowledge of certain type of problem, existing technical scheme; Can't meet consumers' demand; Need the user to re-enter other aspect information of this certain type problem, search has reduced user experience again.
For example; Certain user wants to fit up the house of oneself; Then this user can through the input problem browse problem for " what the most popular decoration style was in recent years? " Obtain related content; Relevant issues can only show generally about 5 that then through the prior art scheme, the user can obtain 5 information about " decoration style "; But the user possibly also hope to obtain the relevant issues and the answer of all kinds of knowledge such as relevant finishing material, finishing price, near finishing merchant public praise, then need re-enter the key word of the information of need obtaining, and has increased user's operation.
Summary of the invention
In order to simplify search operation, improve user experience, the embodiment of the invention provides a kind of method of obtaining information, and said method comprises:
Obtain the key word of user's input;
According to the preset keyword matching condition, obtain first information collection with said key word content match;
Whether judge information content that the said first information concentrates greater than presupposed information quantity, and whether said first information collection comprise at least two semantic type, if, then obtain the information of said presupposed information quantity, said information comprises at least two semantic type;
Send said information to said user.
Saidly whether judge information content that the said first information concentrates, and whether said first information collection comprise at least two semantic type, specifically comprise greater than presupposed information quantity:
Obtain the information content that the said first information is concentrated, judge that whether said information content is greater than presupposed information quantity;
Information to the said first information is concentrated is carried out text cluster by semantic class;
Obtain semantic type the quantity that said first information collection comprises;
Judge that whether said semantic type quantity is more than or equal to two.
The said information of obtaining presupposed information quantity, said information comprise that at least two semantic type specifically comprises:
Semantic type the quantity that comprises when said first information collection is then obtained an information during less than said presupposed information quantity in information that each semantic class comprises, obtain the first temporary information collection;
Calculate the difference number of the quantity of said presupposed information quantity and said semantic class;
Concentrate remaining information to sort from high to low to the said first information by the matching degree of itself and said key word;
Obtain the information that ordering back information position number is less than or equal to said difference number, obtain the second temporary information collection, and said first temporary information collection and said second temporary information are gathered also, obtain the information of said presupposed information quantity;
Semantic type the quantity that comprises when said first information collection is then obtained an information during greater than said presupposed information quantity in information that each semantic class comprises, obtain the 4th temporary information collection;
The information that said the 4th temporary information is concentrated sorts by the matching degree of itself and said key word from high to low;
Obtain the information that ordering back the 4th temporary information concentrated message position number is less than or equal to said presupposed information quantity, obtain the information of said presupposed information quantity.
The said information of obtaining presupposed information quantity, said information comprise that at least two semantic type specifically comprises:
The information that the said first information is concentrated sorts by the matching degree of itself and said key word from low to high;
When said first information collection is SQ={sq 0, sq 1, sq 2,, sq m, m is the information number that the said first information is concentrated;
Then according to rq x=sq yObtain the information of at least two semantic type presupposed information quantity;
Wherein,
Figure BDA0000055844560000031
A=log NM, N are presupposed information quantity, rq xFor pressing
Figure BDA0000055844560000032
Information after in SQ, obtaining.
Saidly said information sent to the user specifically comprise:
The matching degree of said information by itself and said key word sorted from high to low;
Information with after the ordering sends to the user in order successively.
The embodiment of the invention provides a kind of device that obtains information, and said device comprises:
The key word acquisition module is used to obtain the key word of user's input;
First information collection acquisition module is used for according to the preset keyword matching condition, obtains the first information collection with said key word content match;
The information acquisition module; Whether be used to judge information content that the said first information concentrates greater than presupposed information quantity, and whether said first information collection comprise at least two semantic type, if; Then obtain the information of said presupposed information quantity, said information comprises at least two semantic type;
Information sending module is used to send said information to said user.
Said information acquisition module specifically comprises:
Information content is confirmed the unit, is used to obtain the information content that the said first information is concentrated, and judges that whether said information content is greater than presupposed information quantity;
The text cluster unit, the information that is used for the said first information is concentrated is carried out text cluster by semantic class;
Semantic type number obtainment unit is used to obtain semantic type the quantity that said first information collection comprises;
Semantic type determining unit is used to judge that whether said semantic type quantity is more than or equal to two;
Information acquisition unit, the information content that is used for concentrating when the said first information be greater than presupposed information quantity, and said first information collection comprises at least two semantic time-likes, obtains the information of said presupposed information quantity, and said information comprises at least two semantic type.
Said information acquisition module specifically comprises:
Temporary information collection generation unit, semantic type the quantity that is used for comprising when said first information collection then obtain an information during less than said presupposed information quantity in information that each semantic class comprises, obtain the first temporary information collection;
The quantity difference is counted computing unit, is used to calculate the difference number of said presupposed information quantity and said semantic type quantity;
The presupposed information acquiring unit; Be used for concentrating remaining information to sort from high to low by the matching degree of itself and said key word to the said first information; Obtain the information that ordering back information position number is less than or equal to said difference number; Obtain the second temporary information collection, and said first temporary information collection and said second temporary information are gathered also, obtain the information of said presupposed information quantity;
First information acquiring unit; Semantic type the quantity that is used for comprising when said first information collection is during greater than said presupposed information quantity; Then in the information that each semantic class comprises, obtain an information; Obtain the 4th temporary information collection; The information that said the 4th temporary information is concentrated sorts by its matching degree with said key word from high to low, obtain ordering afterwards the 4th temporary information concentrated message position number be less than or equal to the information of said presupposed information quantity, obtain the information of said presupposed information quantity.
Said information acquisition module specifically comprises:
Second information acquisition unit, the information that is used for the said first information is concentrated sorts by the matching degree of itself and said key word from low to high, when said first information collection is SQ={sq 0, sq 1, sq 2,, sq m, m is that information that the said first information is concentrated is when counting, then according to rq x=sq yObtain the information of at least two semantic type presupposed information quantity; Wherein,
Figure BDA0000055844560000041
A=log NM, N are presupposed information quantity, rq xFor pressing
Figure BDA0000055844560000042
Information after in SQ, obtaining.
Said information sending module specifically comprises:
The key word sequencing unit is used for the matching degree of said information by itself and said key word sorted from high to low;
Information transmitting unit is used for the information after the ordering is sent to the user in order successively.
The embodiment of the invention; With the information of the keyword matching of user input in, obtain at least two semantic type information, for the user provides the information with the key word correlation type; Thereby need not the user and re-enter the key word relevant with this key word; Can obtain relevant information, reduce user's operation, improve user experience.
Description of drawings
Fig. 1 is the method flow diagram of the information of obtaining that provides of the embodiment of the invention 1;
Fig. 2 is the method flow diagram of the information of obtaining that provides of the embodiment of the invention 2;
Fig. 3 is the method flow diagram of the information of obtaining that provides of the embodiment of the invention 3;
Fig. 4 is the method flow diagram of the information of obtaining that provides of the embodiment of the invention 4;
Fig. 5 is the device synoptic diagram of the information of obtaining that provides of the embodiment of the invention 5;
Fig. 6 is the device synoptic diagram of the information of obtaining that provides of the embodiment of the invention 6;
Fig. 7 is the device synoptic diagram of the information of obtaining that provides of the embodiment of the invention 7;
Fig. 8 is the device synoptic diagram of the information of obtaining that provides of the embodiment of the invention 8.
Embodiment
For making the object of the invention, technical scheme and advantage clearer, embodiment of the present invention is done to describe in detail further below in conjunction with accompanying drawing.
Embodiment 1
As shown in Figure 1, the embodiment of the invention provides a kind of method of obtaining information, and said method comprises:
S101: the key word that obtains user's input;
S102:, obtain first information collection with said key word content match according to the preset keyword matching condition;
S103: when said first information collection comprises at least two semantic type, and the information content concentrated of the said first information obtains the information of at least two semantic type presupposed information quantity during greater than presupposed information quantity, and said information is sent to the user.
Need to prove that the executive agent of each step of the embodiment of the invention can be search server, also can be for having other executive agents of individual step function.
The embodiment of the invention; With the information of the keyword matching of user input in, obtain at least two semantic type information, thereby the information of the key word correlation type that provides with it be provided for the user; Thereby need not the user and re-enter the key word relevant with this key word; Can obtain relevant information, reduce user's operation, improve user experience.
Embodiment 2
As shown in Figure 2, the embodiment of the invention provides a kind of method of obtaining information, and said method comprises:
S201: the key word that obtains user's input;
Wherein, the key word of user's input can be problem, the inquiry of user search input or the existing problem that can reflect its information requirement that the user will browse that the user puts question to input.
For example, through obtaining the problem of user's input, get access to the problem q of user's input i
S202:, obtain first information collection with said key word content match according to the preset keyword matching condition;
Optional, can carry out information retrieval technique through prior art, from the problem information database that existing question answering system was collected in the past and/or write down, retrieve all and customer problem q iSemantic relevant problem.
For example, through in database to problem q iRetrieve, obtain relevant issues Candidate Set SQ i={ sq 0, sq 1, sq 2,, sq m.
S203: obtain the information content that the said first information is concentrated, whether judge said information content greater than presupposed information quantity, if, then carry out S204, if not, then this problem is selected concentrated information as the information that returns to the user, promptly carry out S206;
Optional, relevant issues Candidate Set SQ in S202 i={ sq 0, sq 1, sq 2,, sq m, the m value is 20, and presupposed information quantity is 10, promptly the information content concentrated of the first information is then carried out S204 greater than presupposed information quantity.
S204: said first information collection is carried out text cluster;
Wherein, text cluster mainly is that promptly similar document similarity is bigger according to the cluster hypothesis, and inhomogeneous document similarity is less.
Preferably, the result that search engine is returned carries out cluster, makes the user navigate to needed information rapidly.Concrete, import search key through the user, then the document that retrieves is carried out clustering processing, and export each different classes of concise and to the point description, thus the scope that can dwindle retrieval, the user only need pay close attention to theme more likely.This in addition method also can be given a clue for user's quadratic search.
Optional, the algorithm class that said first information collection is carried out text cluster comprises: partitioning (partitioning methods), stratification (hierarchical methods), based on the method (density-based methods) of density, based on the method (grid-based methods) of grid with based on the method (model-based methods) of model.
Wherein, partitioning (partitioning methods) is meant: a given data set that N tuple or record are arranged, disintegrating method will be constructed K grouping, each divide into groups just to represent a cluster, K<N.And this K following condition of dividing into groups to satisfy: each divides into groups to comprise at least a data record (1); (2) each data recording belongs to and only belongs to a grouping (note: this requirement can be relaxed) in some fuzzy clustering algorithm; For given K; Algorithm at first provides an initial group technology; Change through the method that iterates later on and divide into groups; Make that the group protocol after improving each time is all more preceding once good, and so-called good standard is exactly: the record in the same grouping is good more closely more, and the record in the different grouping is good more far more.Use the algorithm of this basic thought to have: K-MEANS algorithm, K-MEDOIDS algorithm, CLARANS algorithm.
Stratification (hierarchical methods) is meant: given data set is carried out decomposing like the level, till certain condition satisfies.Specifically can be divided into " bottom-up " and " top-down " two kinds of schemes again.For example in " bottom-up " scheme; Each data recording is all formed an independent group when initial; In ensuing iteration, it is merged into a group to those contiguous each other groups, forms one up to all records and divides into groups or till certain condition satisfies.Represent algorithm to have: BIRCH algorithm, CURE algorithm, CHAMELEON algorithm etc.
Method (density-based methods) based on density is meant: based on the method for density and a fundamental difference of other method be: it is not based on various distances, and is based on density.So just can overcome the shortcoming that to find the cluster of " similar round " based on the algorithm of distance.The guiding theory of this method is exactly that the density that needs only the point in the zone is bigger than certain threshold values, just is added to it in the close with it cluster and goes.Represent algorithm to have: DBSCAN algorithm, OPTICS algorithm, DENCLUE algorithm etc.
Method (grid-based methods) based on grid is meant: at first data space is divided into the network of limited unit (cell), all processing all are to be object with single unit.An outstanding advantage of so handling is exactly that processing speed is very fast, usually this be with target database in the number that writes down irrelevant, it is only with to be divided into data space what unit relevant.Represent algorithm to have: STING algorithm, CLIQUE algorithm, WAVE-CLUSTER algorithm.
Method (model-based methods) based on model is meant: give each cluster supposition a model based on the method for model, remove to seek the data set that can individual well satisfy this model then.Such model possibly be the density fonction of data point in the space or other.An its potential supposition is exactly: target data set is to be determined by a series of probability distribution.Usually there are two kinds to attempt direction: the scheme of statistics and the scheme of neural network.
Also can realize the data that the first information is concentrated are carried out cluster through other algorithms in this step, present embodiment limit.
S205: obtain semantic type the quantity that said first information collection comprises;
For example, to relevant issues Candidate Set SQ i={ sq 0, sq 1, sq 2,, sq m, the m value is that 20 first information Candidate Set carries out cluster by its semantic type, obtains 3 semantic type.
S206: whether judge said semantic type quantity more than or equal to two, if greater than, then obtain the information of presupposed information quantity, said information is at least two semantic type.
For example, shown in example among the S205, the class of languages of this first information collection is 3 types, greater than two semantic type, then obtains the information of presupposed information quantity, and said information is at least two semantic type.
S207: the matching degree of said information by itself and said key word sorted from high to low;
S208: the information after will sorting once sends to the user in order.
Need to prove that the executive agent of each step of the embodiment of the invention can be search server, also can be for having other executive agents of individual step function.
The embodiment of the invention; With the information of the keyword matching of user input in, obtain at least two semantic type information, thereby the information of the key word correlation type that provides with it be provided for the user; Thereby need not the user and re-enter the key word relevant with this key word; Can obtain relevant information, reduce user's operation, improve user experience.
Embodiment 3
As shown in Figure 3, the embodiment of the invention provides a kind of method of obtaining information, and said method comprises step S301~S310; Wherein S301~S305 is identical with S201~S205 among the embodiment 2; Repeat no more, different with embodiment 2 is here, further comprising the steps of in this enforcement:
S306: whether judge semantic type quantity that said first information collection comprises greater than said presupposed information quantity, if greater than, then in information that each semantic class comprises, obtain an information, obtain the first temporary information collection;
For example, relevant issues Candidate Set SQ i={ sq 0, sq 1, sq 2,, sq m, the m value is that semantic type the quantity that comprises in 20 is 3, presupposed information quantity is 10; Then in each semantic class, obtain an information, obtain 3 different information of semantic type here, form the first temporary information collection LQ1={lq1 0, lq1 1, lq1 2.
S307: the difference number that calculates the quantity of said presupposed information quantity and said semantic class;
For example, after S306 gets access to 3 information, then calculate the difference number of the quantity of said presupposed information quantity and said semantic class, promptly presupposed information quantity 10 deducts 3, and the difference number is 7.
S308: concentrate remaining information to sort from high to low by the matching degree of itself and said key word to the said first information;
For example, relevant issues Candidate Set SQ i={ sq 0, sq 1, sq 2,, sq m, the m value is in 20, goes out 3 information having obtained, also has 17 information, and the matching degree of these 17 information by itself and key word sorted from high to low.
S309: obtain the information that ordering back information position number is less than or equal to said difference number, obtain the second temporary information collection, and said first temporary information collection and said second temporary information are gathered also;
For example, the remaining information sequence number after the ordering is 1~17, then obtains the information that position number is equal to or less than difference several 7, and promptly message sequence number is 1~7 information, thereby obtains the second temporary information collection LQ2={lq2 0, lq2 1, lq2 2, lq2 3, lq2 4, lq2 5, and with the set of the first temporary information collection and second temporary information also.
S310: the information after will merging sends to the user.
For example, with the set of the first temporary information collection and second temporary information and obtain information lq1 0, lq1 1, lq1 2, lq2 0, lq2 1, lq2 2, lq2 3, lq2 4, lq2 5,, this information is sent to the user.
Preferably, also can sort from high to low to the matching degree of this information by itself and key word, the information with after the ordering sends to the user in order.
Need to prove that present embodiment is merely a kind of different semantic type information getting method that obtains; The concrete information that gets access to different semantic classes also can be passed through accomplished in many ways; To realize that the information that gets access to belongs to the scope that the different semantic type of methods that are purpose adopts all belong to the present embodiment protection, specifically repeats no more.
Need to prove that the executive agent of each step of the embodiment of the invention can be search server, also can be for having other executive agents of individual step function.
The embodiment of the invention; With the information of the keyword matching of user input in, obtain at least two semantic type information, thereby the information of the key word correlation type that provides with it be provided for the user; Thereby need not the user and re-enter the key word relevant with this key word; Can obtain relevant information, reduce user's operation, improve user experience.
Embodiment 4
As shown in Figure 4, the embodiment of the invention provides a kind of method of obtaining information, and said method comprises:
S401: the key word that obtains user's input;
Wherein, the key word of user's input can be problem, the inquiry of user search input or the existing problem that can reflect its information requirement that the user will browse that the user puts question to input.
For example, through obtaining the problem of user's input, get access to the problem q of user's input i
S402:, obtain first information collection with said key word content match according to the preset keyword matching condition;
Optional, can carry out information retrieval technique through prior art, from the problem information database that existing question answering system was collected in the past and/or write down, retrieve all and customer problem q iSemantic relevant problem.
S403: obtain the information content that the said first information is concentrated, whether judge said information content greater than presupposed information quantity, if greater than, then carry out S404, if less than, S405 then carried out;
In the present embodiment, optional, when said information content during greater than presupposed information quantity, the information that can concentrate the said first information is by its matching degree with said key word back execution S404 that sorts from low to high.
For example, carry out information retrieval technique, from the problem information database that existing question answering system was collected in the past and/or write down, retrieve all and customer problem q through prior art iAfter the semantic relevant problem, according to them and problem q iSimilarity sort and obtain relevant issues Candidate Set SQ i={ sq 0, sq 1, sq 2,, sq m.
S404: when said first information collection is SQ={sq 0, sq 1, sq 2,, sq m, m is the information number that the said first information is concentrated; Then according to rq x=sq yObtain the information of at least two semantic type presupposed information quantity.
Wherein,
Figure BDA0000055844560000101
A=log NM, N are presupposed information quantity, rq xFor pressing Information after in SQ, obtaining.
Concrete, from SQ iN semantic relevant issues of progressively dispersing of middle taking-up.Make N a=m, i.e. a=log NM,, get function
Figure BDA0000055844560000103
Sq then yBe x relevant issues rq xThereby, the information set RQ after obtaining obtaining i={ rq 1, rq 2, }.X is a Nonlinear Mapping of progressively dispersing to y, can guarantee preferential output sequence SQ like this iIn with q iMaximally related inquiry also can guarantee SQ iThe semanteme of back is correlated with but divergence problem can output in the relevant issues.
Optional, also can be to SQ iAfter sorting, structure mapping function y=f (x), (f (N)≤m) makes rq x=sq yThereby, obtain key issue RQ i={ rq 1, rq 2,, rq N.Various suitable mapping function f (x) all can be used to address this problem, like power function, exponential function etc.
S405: said information is sent to the user.
Optional, output problem q iRelevant issues RQ i={ rq 1, rq 2,, show each relevant issues at the problem browsing pages successively to the user.
Need to prove that the executive agent of each step of the embodiment of the invention can be search server, also can be for having other executive agents of individual step function.
The embodiment of the invention; With the information of the keyword matching of user input in, obtain at least two semantic type information, thereby the information of the key word correlation type that provides with it be provided for the user; Thereby need not the user and re-enter the key word relevant with this key word; Can obtain relevant information, reduce user's operation, improve user experience.
Embodiment 5
As shown in Figure 5, the embodiment of the invention provides a kind of device that obtains information, and said device comprises: key word acquisition module 501, first information collection acquisition module 502, information acquisition module 503 and information sending module 504, wherein:
Key word acquisition module 501 is used to obtain the key word of user's input;
First information collection acquisition module 502 is used for according to the preset keyword matching condition, obtains the first information collection with said key word content match;
Information acquisition module 503; Whether be used to judge information content that the said first information concentrates greater than presupposed information quantity, and whether said first information collection comprise at least two semantic type, if; Then obtain the information of said presupposed information quantity, said information comprises at least two semantic type;
Information sending module 504 is used to send said information to said user.
The embodiment of the invention; With the information of the keyword matching of user input in, obtain at least two semantic type information, thereby the information of the key word correlation type that provides with it be provided for the user; Thereby need not the user and re-enter the key word relevant with this key word; Can obtain relevant information, reduce user's operation, improve user experience.
Embodiment 6
As shown in Figure 6, the embodiment of the invention provides a kind of device that obtains information, and is similar with embodiment 5, and said device comprises key word acquisition module 501, first information collection acquisition module 502, information acquisition module 503 and information sending module 504.
Further said information acquisition module 503 specifically comprises:
Information content is confirmed unit 5031, is used to obtain the information content that the said first information is concentrated, and whether judges said information content greater than presupposed information quantity, if greater than, the information content that the then said first information is concentrated is greater than presupposed information quantity;
Text cluster unit 5032, the information that is used for the said first information is concentrated is carried out text cluster by semantic class;
Semantic type number obtainment unit 5033 is used to obtain semantic type the quantity that said first information collection comprises;
Whether semantic type determining unit 5034 is used to judge said semantic type quantity more than or equal to two, if greater than, then said first information collection comprises at least two semantic type.
Information acquisition unit 5035, the information content that is used for concentrating when the said first information be greater than presupposed information quantity, and said first information collection comprises at least two semantic time-likes, obtains the information of said presupposed information quantity, and said information comprises at least two semantic type.
Wherein, said information sending module 504 specifically comprises:
Key word sequencing unit 5041 is used for the matching degree of said information by itself and said key word sorted from high to low;
Information transmitting unit 5042 is used for the information after the ordering is sent to the user in order successively.
The embodiment of the invention; With the information of the keyword matching of user input in, obtain at least two semantic type information, thereby the information of the key word correlation type that provides with it be provided for the user; Thereby need not the user and re-enter the key word relevant with this key word; Can obtain relevant information, reduce user's operation, improve user experience.
Embodiment 7
As shown in Figure 7, the embodiment of the invention provides a kind of device that obtains information, and is similar with embodiment 6, and said device comprises: key word acquisition module 501, first information collection acquisition module 502, information acquisition module 503 and information sending module 504.Wherein, information sending module 504 comprises: key word sequencing unit 5041 and information transmitting unit 5042, and different with embodiment 6 is that in the present embodiment, information acquisition module 503 specifically comprises:
Temporary information collection generation unit 5036, semantic type the quantity that is used for comprising when said first information collection then obtain an information during less than said presupposed information quantity in information that each semantic class comprises, obtain the first temporary information collection;
The quantity difference is counted computing unit 5037, is used to calculate the difference number of said presupposed information quantity and said semantic type quantity;
Presupposed information acquiring unit 5038; Be used for concentrating remaining information to sort from high to low by the matching degree of itself and said key word to the said first information; Obtain the information that ordering back information position number is less than or equal to said difference number; Obtain the second temporary information collection, and said first temporary information collection and said second temporary information are gathered also, obtain the information of said presupposed information quantity;
First information acquiring unit 5039; Semantic type the quantity that is used for comprising when said first information collection is during greater than said presupposed information quantity; Then in the information that each semantic class comprises, obtain an information; Obtain the 4th temporary information collection; The information that said the 4th temporary information is concentrated sorts by its matching degree with said key word from high to low, obtain ordering afterwards the 4th temporary information concentrated message position number be less than or equal to the information of said presupposed information quantity, obtain the information of said presupposed information quantity.
The embodiment of the invention; With the information of the keyword matching of user input in, obtain at least two semantic type information, thereby the information of the key word correlation type that provides with it be provided for the user; Thereby need not the user and re-enter the key word relevant with this key word; Can obtain relevant information, reduce user's operation, improve user experience.
Embodiment 8
As shown in Figure 8, the embodiment of the invention provides a kind of device that obtains information, and is similar with embodiment 6, and said device comprises: key word acquisition module 501, first information collection acquisition module 502, information acquisition module 503 and information sending module 504.Wherein, information sending module 504 comprises: key word sequencing unit 5041 and information transmitting unit 5042, and different with embodiment 6 is that in the present embodiment, information acquisition module 503 specifically comprises:
Second information acquisition unit 50310, the information that is used for the said first information is concentrated sorts by the matching degree of itself and said key word from low to high, when said first information collection is SQ={sq 0, sq 1, sq 2,, sq m, m is that information that the said first information is concentrated is when counting, then according to rq x=sq yObtain the information of at least two semantic type presupposed information quantity; Wherein,
Figure BDA0000055844560000131
A=log NM, N are presupposed information quantity, rq xFor pressing Information after in SQ, obtaining.
The embodiment of the invention; With the information of the keyword matching of user input in, obtain at least two semantic type information, thereby the information of the key word correlation type that provides with it be provided for the user; Thereby need not the user and re-enter the key word relevant with this key word; Can obtain relevant information, reduce user's operation, improve user experience.
All or part of content in the technical scheme that above embodiment provides can realize that through software programming its software program is stored in the storage medium that can read, storage medium for example: the hard disk in the computing machine, CD or floppy disk.
The above is merely preferred embodiment of the present invention, and is in order to restriction the present invention, not all within spirit of the present invention and principle, any modification of being done, is equal to replacement, improvement etc., all should be included within protection scope of the present invention.

Claims (10)

1. a method of obtaining information is characterized in that, said method comprises:
Obtain the key word of user's input;
According to the preset keyword matching condition, obtain first information collection with said key word content match;
Whether judge information content that the said first information concentrates greater than presupposed information quantity, and whether said first information collection comprise at least two semantic type, if, then obtain the information of said presupposed information quantity, said information comprises at least two semantic type;
Send said information to said user.
2. method according to claim 1 is characterized in that, saidly whether judges information content that the said first information concentrates greater than presupposed information quantity, and whether said first information collection comprise at least two semantic type, specifically comprises:
Obtain the information content that the said first information is concentrated, judge that whether said information content is greater than presupposed information quantity;
Information to the said first information is concentrated is carried out text cluster by semantic class;
Obtain semantic type the quantity that said first information collection comprises;
Judge that whether said semantic type quantity is more than or equal to two.
3. method according to claim 1 is characterized in that, the said information of obtaining presupposed information quantity, said information comprise that at least two semantic type specifically comprises:
Semantic type the quantity that comprises when said first information collection is then obtained an information during less than said presupposed information quantity in information that each semantic class comprises, obtain the first temporary information collection;
Calculate the difference number of the quantity of said presupposed information quantity and said semantic class;
Concentrate remaining information to sort from high to low to the said first information by the matching degree of itself and said key word;
Obtain the information that ordering back information position number is less than or equal to said difference number, obtain the second temporary information collection, and said first temporary information collection and said second temporary information are gathered also, obtain the information of said presupposed information quantity;
Semantic type the quantity that comprises when said first information collection is then obtained an information during greater than said presupposed information quantity in information that each semantic class comprises, obtain the 4th temporary information collection;
The information that said the 4th temporary information is concentrated sorts by the matching degree of itself and said key word from high to low;
Obtain the information that ordering back the 4th temporary information concentrated message position number is less than or equal to said presupposed information quantity, obtain the information of said presupposed information quantity.
4. method according to claim 1 is characterized in that, the said information of obtaining presupposed information quantity, said information comprise that at least two semantic type specifically comprises:
The information that the said first information is concentrated sorts by the matching degree of itself and said key word from low to high;
When said first information collection is SQ={sq 0, sq 1, sq 2,, sq m, m is the information number that the said first information is concentrated;
Then according to rq x=sq yObtain the information of at least two semantic type presupposed information quantity;
Wherein,
Figure FDA0000055844550000021
A=log NM, N are presupposed information quantity, rq xFor pressing
Figure FDA0000055844550000022
Information after in SQ, obtaining.
5. method according to claim 1 is characterized in that, saidly said information is sent to the user specifically comprises:
The matching degree of said information by itself and said key word sorted from high to low;
Information with after the ordering sends to the user in order successively.
6. a device that obtains information is characterized in that, said device comprises:
The key word acquisition module is used to obtain the key word of user's input;
First information collection acquisition module is used for according to the preset keyword matching condition, obtains the first information collection with said key word content match;
The information acquisition module; Whether be used to judge information content that the said first information concentrates greater than presupposed information quantity, and whether said first information collection comprise at least two semantic type, if; Then obtain the information of said presupposed information quantity, said information comprises at least two semantic type;
Information sending module is used to send said information to said user.
7. device according to claim 6 is characterized in that, said information acquisition module specifically comprises:
Information content is confirmed the unit, is used to obtain the information content that the said first information is concentrated, and judges that whether said information content is greater than presupposed information quantity;
The text cluster unit, the information that is used for the said first information is concentrated is carried out text cluster by semantic class;
Semantic type number obtainment unit is used to obtain semantic type the quantity that said first information collection comprises;
Semantic type determining unit is used to judge that whether said semantic type quantity is more than or equal to two;
Information acquisition unit, the information content that is used for concentrating when the said first information be greater than presupposed information quantity, and said first information collection comprises at least two semantic time-likes, obtains the information of said presupposed information quantity, and said information comprises at least two semantic type.
8. device according to claim 6 is characterized in that, said information acquisition module specifically comprises:
Temporary information collection generation unit, semantic type the quantity that is used for comprising when said first information collection then obtain an information during less than said presupposed information quantity in information that each semantic class comprises, obtain the first temporary information collection;
The quantity difference is counted computing unit, is used to calculate the difference number of said presupposed information quantity and said semantic type quantity;
The presupposed information acquiring unit; Be used for concentrating remaining information to sort from high to low by the matching degree of itself and said key word to the said first information; Obtain the information that ordering back information position number is less than or equal to said difference number; Obtain the second temporary information collection, and said first temporary information collection and said second temporary information are gathered also, obtain the information of said presupposed information quantity;
First information acquiring unit; Semantic type the quantity that is used for comprising when said first information collection is during greater than said presupposed information quantity; Then in the information that each semantic class comprises, obtain an information; Obtain the 4th temporary information collection; The information that said the 4th temporary information is concentrated sorts by its matching degree with said key word from high to low, obtain ordering afterwards the 4th temporary information concentrated message position number be less than or equal to the information of said presupposed information quantity, obtain the information of said presupposed information quantity.
9. device according to claim 6 is characterized in that, said information acquisition module specifically comprises:
Second information acquisition unit, the information that is used for the said first information is concentrated sorts by the matching degree of itself and said key word from low to high, when said first information collection is SQ={sq 0, sq 1, sq 2,, sq m, m is that information that the said first information is concentrated is when counting, then according to rq x=sq yObtain the information of at least two semantic type presupposed information quantity; Wherein, A=log NM, N are presupposed information quantity, rq xFor pressing
Figure FDA0000055844550000042
Information after in SQ, obtaining.
10. device according to claim 6 is characterized in that, said information sending module specifically comprises:
The key word sequencing unit is used for the matching degree of said information by itself and said key word sorted from high to low;
Information transmitting unit is used for the information after the ordering is sent to the user in order successively.
CN201110096463.9A 2011-04-18 The method and apparatus of acquisition information Active CN102750277B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201110096463.9A CN102750277B (en) 2011-04-18 The method and apparatus of acquisition information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201110096463.9A CN102750277B (en) 2011-04-18 The method and apparatus of acquisition information

Publications (2)

Publication Number Publication Date
CN102750277A true CN102750277A (en) 2012-10-24
CN102750277B CN102750277B (en) 2016-12-14

Family

ID=

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104765726A (en) * 2015-04-27 2015-07-08 湘潭大学 Data classification method based on information density
CN105608496A (en) * 2015-11-09 2016-05-25 国家电网公司 Reason analysis method for sharp increase of distribution rush-repair work orders based on k-means clustering algorithm

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1839386A (en) * 2003-08-21 2006-09-27 伊迪利亚公司 Internet searching using semantic disambiguation and expansion
CN101025753A (en) * 2007-03-28 2007-08-29 上海汉光知识产权数据科技有限公司 Patent search method
US20070294200A1 (en) * 1998-05-28 2007-12-20 Q-Phrase Llc Automatic data categorization with optimally spaced semantic seed terms
CN101169780A (en) * 2006-10-25 2008-04-30 华为技术有限公司 Semantic ontology retrieval system and method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070294200A1 (en) * 1998-05-28 2007-12-20 Q-Phrase Llc Automatic data categorization with optimally spaced semantic seed terms
CN1839386A (en) * 2003-08-21 2006-09-27 伊迪利亚公司 Internet searching using semantic disambiguation and expansion
CN101169780A (en) * 2006-10-25 2008-04-30 华为技术有限公司 Semantic ontology retrieval system and method
CN101025753A (en) * 2007-03-28 2007-08-29 上海汉光知识产权数据科技有限公司 Patent search method

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104765726A (en) * 2015-04-27 2015-07-08 湘潭大学 Data classification method based on information density
CN104765726B (en) * 2015-04-27 2018-07-31 湘潭大学 A kind of data classification method based on information density
CN105608496A (en) * 2015-11-09 2016-05-25 国家电网公司 Reason analysis method for sharp increase of distribution rush-repair work orders based on k-means clustering algorithm
CN105608496B (en) * 2015-11-09 2021-07-27 国家电网公司 Reason analysis method for sudden increase of allocation and preemption work orders based on k-means clustering algorithm

Similar Documents

Publication Publication Date Title
Tang et al. Large scale multi-label classification via metalabeler
Bharti et al. Hybrid dimension reduction by integrating feature selection with feature extraction method for text clustering
CN100595759C (en) Method and device for enquire enquiry extending as well as related searching word stock
Yang et al. Discovering topic representative terms for short text clustering
CN107844559A (en) A kind of file classifying method, device and electronic equipment
WO2017097231A1 (en) Topic processing method and device
US20130212111A1 (en) System and method for text categorization based on ontologies
CN103838756A (en) Method and device for determining pushed information
CN101794311A (en) Fuzzy data mining based automatic classification method of Chinese web pages
CN102855282B (en) A kind of document recommendation method and device
CN110543595B (en) In-station searching system and method
Rafea et al. Topic detection approaches in identifying topics and events from arabic corpora
Wu et al. Probabilistic latent semantic user segmentation for behavioral targeted advertising
Zhu et al. A recommendation engine for travel products based on topic sequential patterns
Park et al. Aspect-level news browsing: Understanding news events from multiple viewpoints
CN104216979A (en) Chinese technology patent automatic classification system and method for patent classification by using system
Zhu et al. Real-time personalized twitter search based on semantic expansion and quality model
Peng et al. Emerging topic detection from microblog streams based on emerging pattern mining
CN106934046A (en) A kind of distribution of publications analysis system and method
Shen et al. Multi-task learning for email search ranking with auxiliary query clustering
Xiao A Survey of Document Clustering Techniques & Comparison of LDA and moVMF
Qiu et al. CLDA: An effective topic model for mining user interest preference under big data background
Liao et al. Improving farm management optimization: Application of text data analysis and semantic networks
Zhou et al. Enhancing potential re-finding in personalized search with hierarchical memory networks
Guo et al. AOL4PS: A large-scale data set for personalized search

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
ASS Succession or assignment of patent right

Owner name: SHENZHEN SHIJI LIGHT SPEED INFORMATION TECHNOLOGY

Free format text: FORMER OWNER: TENGXUN SCI-TECH (SHENZHEN) CO., LTD.

Effective date: 20131121

C41 Transfer of patent application or patent right or utility model
COR Change of bibliographic data

Free format text: CORRECT: ADDRESS; FROM: 518000 SHENZHEN, GUANGDONG PROVINCE TO: 518057 SHENZHEN, GUANGDONG PROVINCE

TA01 Transfer of patent application right

Effective date of registration: 20131121

Address after: A Tencent Building in Shenzhen Nanshan District City, Guangdong streets in Guangdong province science and technology 518057 16

Applicant after: Shenzhen Shiji Guangsu Information Technology Co., Ltd.

Address before: 518000 Guangdong city of Shenzhen province Futian District SEG Science Park 2 East Room 403

Applicant before: Tencent Technology (Shenzhen) Co., Ltd.

C14 Grant of patent or utility model
GR01 Patent grant