CN105653671A

CN105653671A - Similar information recommendation method and system

Info

Publication number: CN105653671A
Application number: CN201511017551.XA
Authority: CN
Inventors: 沈磊
Original assignee: CHANJET INFORMATION TECHNOLOGY Co Ltd
Current assignee: CHANJET INFORMATION TECHNOLOGY Co Ltd
Priority date: 2015-12-29
Filing date: 2015-12-29
Publication date: 2016-06-08

Abstract

The present invention provides a similar information recommendation method and system. The similar information recommendation method comprises: according to a keyword in searching content, determining a preliminary candidate set; according to a semantic similarity degree between the searching content and each piece of information in the preliminary candidate set, determining similar information corresponding to the searching content in the preliminary candidate set; and presenting the similar information. By means of the method and system provided by the technical scheme of the present invention, the problem that specific semantics of searching information can not be determined by a simple keyword search is avoided, the information similar to the searching content can be more accurately provided, searching efficiency of users is improved, repeated posting of the users is also avoided, and thus user experience is enhanced.

Description

Analog information recommend method and system

Technical field

The present invention relates to field of computer technology, specifically, it relates to a kind of analog information recommend method and a kind of analog information commending system.

Background technology

At present, in Web Community, put question to or when browsing to asked a question relevant model, similar problem and answer thereof initiatively can be recommended user by system when user posts. Such as, when user inputs the content of enquirement in input frame, system can provide similar problem list, and along with the change of user input content, recommendation list also can change thereupon. For another example, when user is when browsing to asked a question relevant model, system can provide the problem list being asked a question similar with user. By aforesaid method, user is recommended in the same or similar problem of buffer memory and answer thereof in network, then need not again carry out the Q&A of repetition, both reduce the redundancy of same or similar model, and also improved the satisfactory degree of user.

But, aforesaid method is that the keyword based on user's subject of question is searched for usually, instead of based on, on the basis of semanteme understanding user's problem, which results in much similar problem cannot by system recommendation because of the difference of personal expression mode.

It is thus desirable to a kind of new technical scheme, it is possible to the information similar to search content more accurately is provided, promotes the search efficiency of user.

Summary of the invention

The present invention is just based on the problems referred to above, it is proposed that a kind of new technical scheme, it is possible to more accurately provide the information similar to search content, promotes the search efficiency of user.

In view of this, an aspect of of the present present invention proposes a kind of analog information recommend method, comprising: according to the keyword in search content, it is determined that preliminary candidate's collection; Concentrate the semantic similarity of every bar information according to described search content and described preliminary candidate, concentrate described preliminary candidate and determine the analog information corresponding with described search content; Show described analog information.

In this technical scheme, it is possible to after determining that according to keyword preliminary candidate collects, calculating search content and preliminary candidate concentrate the semantic similarity of every bar information, thus determine that the analog information of search content recommends user according to semantic similarity. By this technical scheme, avoid the problem that simple keyword search cannot confirm the concrete semanteme of search information, it is possible to the information similar to search content more accurately is provided, improves the search efficiency of user, also user is avoided to repeat to post, convenient for users.

In technique scheme, it may be preferred that described search content comprises asked questions, the information that described preliminary candidate concentrates: existing asked questions and existing problem answers.

In this technical scheme, search content comprises asked questions, namely user proposes problem at social network sites such as forums, the existing asked questions of the information that preliminary candidate concentrates and existing problem answers, that is, when the asked questions for user is retrieved, the semanteme of existing asked questions and existing problem answers can be covered simultaneously, thus it is convenient to more accurately provide the information similar to search content, it is convenient to as user shows more accurate answer.

In above-mentioned any technology scheme, preferably, the described semantic similarity concentrating every bar information according to described search content and described preliminary candidate, concentrate described preliminary candidate and determine the analog information corresponding with described search content, comprise: train described search content by language model and the unit semantic vector of information that described preliminary candidate concentrates, wherein, described unit semantic vector is word vector or word vector; According to described unit semantic vector, calculating described search content and the semantic similarity of information that described preliminary candidate concentrates, wherein, described semantic similarity comprises: word vector cumulative sum, word vector cumulative sum, word vector mean value or word vector mean value; And the described analog information of described displaying, comprising: the information that described preliminary candidate concentrates is sorted and shows from height to low according to described semantic similarity.

In this technical scheme, by language model training units semantic vector. In search content, the basic semantic unit of sentence has two kinds, one is word, one is word, word justice and word semanteme can be used for forming sentence semantics, if word is as fundamental unit, it is necessary to sentence is carried out participle, with word as fundamental unit, needing sentence word for word cutting, therefore, unit semantic vector is word vector or word vector. These two kinds of methods all need to train word occurrence semantic vector or word semantic vector according to language model with previously prepd word language material, language model is the probability model calculating a sentence, it is assumed based on Markov, that is, the appearance of next word only depends on one or several word before it. According to this principle, it is possible to use word language material trains word vector or word vector. Relation between the semantic vector trained like this, it is possible to directly embody from these two vectorial differences. The difference of vector is exactly definition mathematically, directly subtract each other by turn, such as, semantic " king "-semantic " queen " �� semanteme " man "-semantic " woman " is exactly semantic " queen " with semantic " king "-semantic " man "+semantic " woman " immediate vector.

In this technical scheme, how vectorial according to word vector or word focusing on, obtain sentence vector, the relationship between quality of sentence vector is to the similarity of sentence, and then affects the effect of sentence recommendation. Calculating sentence vector and can take two kinds of methods: one is by word (or word) semantic vector cumulative sum, as sentence vector, one is the mean value with word (or word) semantic vector, as sentence vector.

Pass through technique scheme, simple keyword search can be avoided cannot to confirm the problem of concrete semanteme of search information, it is possible to more accurately provide and the information of search content semantic similitude, improve the search efficiency of user, also user is avoided to repeat to post, convenient for users.

In above-mentioned any technology scheme, preferably, before described preliminary candidate concentrates and determines the analog information corresponding with described search content, also comprise: other similarities determining the information that described search content and described preliminary candidate concentrate, wherein, other similarities described comprise following one or a combination set of: keyword similarity, keyword duplicate removal similarity, keyword discrate analog degree and product Word similarity; And concentrate described preliminary candidate and to determine the analog information corresponding with described search content, specifically comprise: determine described analog information according to described semantic similarity and other similarities described.

In this technical scheme, it is possible to one or more in product Word similarity of keyword similarity, keyword duplicate removal similarity, keyword discrate analog degree is used as the standard recommended together with semantic similarity.Wherein, according to keyword shared weight in problem, and user inputs and the keyword coincidence degree of candidate's problem (comprising the answer that it is corresponding), calculates keyword similarity; Keyword duplicate removal similarity is exactly remove the impact of repetition keyword, and calculating separately both has the common keyword how much not repeated; Keyword discrate analog degree refers between the search content that user inputs and the information that preliminary candidate concentrates, whether keyword has identical distribution, it is be uniformly distributed, still somewhere is concentrated on, the information that the search content generally user inputted and preliminary candidate concentrate is cut into clause, calculating has how many clauses to comprise common keyword, as keyword discrate analog degree score. In addition, the problem of user there are many about products such as softwares, for the problem that different products proposes, should not be considered as similar problem, such as, if two problems contains identical product word, then product Word similarity is 1, and not containing like products word, then product Word similarity is 0. By technique scheme, semantic similarity and other one or more similarities with the use of, it is possible to more accurately recommend analog information, improve Consumer's Experience.

In above-mentioned any technology scheme, it may be preferred that described according to the keyword in search content, it is determined that preliminary candidate also comprises: the expression word removing in described keyword and stop-word before collecting.

In this technical scheme, owing to expression word and stop-word are often useless, can cause recommending not to be inconsistent with expectation, then can remove the expression word in keyword and stop-word before forming preliminary candidate collection, promote the validity of content recommendation.

The another aspect of the present invention proposes a kind of analog information commending system, comprising: candidate collects determining unit, according to the keyword in search content, it is determined that preliminary candidate's collection; Analog information determining unit, concentrates the semantic similarity of every bar information according to described search content and described preliminary candidate, concentrates described preliminary candidate and determines the analog information corresponding with described search content; Analog information display unit, shows described analog information.

In above-mentioned any technology scheme, preferably, described analog information determining unit comprises: vector training unit, trains described search content by language model and the unit semantic vector of information that described preliminary candidate concentrates, wherein, described unit semantic vector is word vector or word vector; Semantic Similarity Measurement unit, according to described unit semantic vector, calculating described search content and the semantic similarity of information that described preliminary candidate concentrates, wherein, described semantic similarity comprises: word vector cumulative sum, word vector cumulative sum, word vector mean value or word vector mean value; And described analog information display unit specifically for: the information that described preliminary candidate concentrates is sorted and shows from height to low according to described semantic similarity.

In this technical scheme, by language model training units semantic vector.In search content, the basic semantic unit of sentence has two kinds, one is word, one is word, word justice and word semanteme can be used for forming sentence semantics, if word is as fundamental unit, it is necessary to sentence is carried out participle, with word as fundamental unit, needing sentence word for word cutting, therefore, unit semantic vector is word vector or word vector. These two kinds of methods all need to train word occurrence semantic vector or word semantic vector according to language model with previously prepd word language material, language model is the probability model calculating a sentence, it is assumed based on Markov, that is, the appearance of next word only depends on one or several word before it. According to this principle, it is possible to use word language material trains word vector or word vector. Relation between the semantic vector trained like this, it is possible to directly embody from these two vectorial differences. The difference of vector is exactly definition mathematically, directly subtract each other by turn, such as, semantic " king "-semantic " queen " �� semanteme " man "-semantic " woman " is exactly semantic " queen " with semantic " king "-semantic " man "+semantic " woman " immediate vector.

In above-mentioned any technology scheme, preferably, also comprise: other similarity determining unit, before described preliminary candidate concentrates and determines the analog information corresponding with described search content, determine other similarities of the information that described search content and described preliminary candidate concentrate, wherein, other similarities described comprise following one or a combination set of: keyword similarity, keyword duplicate removal similarity, keyword discrate analog degree and product Word similarity; And described analog information determining unit is used for: determine described analog information according to described semantic similarity and other similarities described.

In this technical scheme, it is possible to one or more in product Word similarity of keyword similarity, keyword duplicate removal similarity, keyword discrate analog degree is used as the standard recommended together with semantic similarity. Wherein, according to keyword shared weight in problem, and user inputs and the keyword coincidence degree of candidate's problem (comprising the answer that it is corresponding), calculates keyword similarity; Keyword duplicate removal similarity is exactly remove the impact of repetition keyword, and calculating separately both has the common keyword how much not repeated; Keyword discrate analog degree refers between the search content that user inputs and the information that preliminary candidate concentrates, whether keyword has identical distribution, it is be uniformly distributed, still somewhere is concentrated on, the information that the search content generally user inputted and preliminary candidate concentrate is cut into clause, calculating has how many clauses to comprise common keyword, as keyword discrate analog degree score.In addition, the problem of user there are many about products such as softwares, for the problem that different products proposes, should not be considered as similar problem, such as, if two problems contains identical product word, then product Word similarity is 1, and not containing like products word, then product Word similarity is 0. By technique scheme, semantic similarity and other one or more similarities with the use of, it is possible to more accurately recommend analog information, improve Consumer's Experience.

In above-mentioned any technology scheme, it may be preferred that also comprise: removal unit, described according to the keyword in search content, it is determined that preliminary candidate removes the expression word in described keyword and stop-word before collecting.

By above technical scheme, avoid the problem that simple keyword search cannot confirm the concrete semanteme of search information, it is possible to the information similar to search content more accurately is provided, improves the search efficiency of user, also avoid user to repeat to post, thus improve the experience of user.

Accompanying drawing explanation

Fig. 1 shows the schema of analog information recommend method according to one embodiment of present invention;

Fig. 2 shows the block diagram of analog information commending system according to one embodiment of present invention;

Fig. 3 shows the schematic diagram carrying out analog information recommendation according to one embodiment of present invention;

Fig. 4 shows the schematic diagram of determination semantic similarity according to one embodiment of present invention;

Fig. 5 shows the schematic diagram that analog information according to one embodiment of present invention recommends interface;

Fig. 6 shows the schematic diagram that the analog information according to an alternative embodiment of the invention recommends interface.

Embodiment

In order to more clearly understand above-mentioned purpose, the feature and advantage of the present invention, below in conjunction with the drawings and specific embodiments, the present invention is further described in detail. It should be noted that, when not conflicting, the feature in the embodiment of the application and embodiment can combine mutually.

Set forth a lot of detail in the following description so that fully understanding the present invention; but; the present invention can also adopt other to be different from other modes described here to implement, and therefore, protection scope of the present invention is by the restriction of following public specific embodiment.

Fig. 1 shows the schema of analog information recommend method according to one embodiment of present invention.

As shown in Figure 1, analog information recommend method according to one embodiment of present invention, comprising:

Step 102, according to the keyword in search content, it is determined that preliminary candidate's collection;

Step 104, concentrates the semantic similarity of every bar information according to described search content and described preliminary candidate, concentrates described preliminary candidate and determines the analog information corresponding with described search content;

Step 106, shows described analog information.

In above-mentioned any technology scheme, it may be preferred that step 104 comprises: training described search content by language model and the unit semantic vector of information that described preliminary candidate concentrates, wherein, described unit semantic vector is word vector or word vector; According to described unit semantic vector, calculating described search content and the semantic similarity of information that described preliminary candidate concentrates, wherein, described semantic similarity comprises: word vector cumulative sum, word vector cumulative sum, word vector mean value or word vector mean value; And step 106 comprises: the information that described preliminary candidate concentrates is sorted and shows from height to low according to described semantic similarity.

In above-mentioned any technology scheme, preferably, before step 104, also comprise: other similarities determining the information that described search content and described preliminary candidate concentrate, wherein, other similarities described comprise following one or a combination set of: keyword similarity, keyword duplicate removal similarity, keyword discrate analog degree and product Word similarity;And step 104 specifically comprises: determine described analog information according to described semantic similarity and other similarities described.

In this technical scheme, it is possible to one or more in product Word similarity of keyword similarity, keyword duplicate removal similarity, keyword discrate analog degree is used as the standard recommended together with semantic similarity. Wherein, according to keyword shared weight in problem, and user inputs and the keyword coincidence degree of candidate's problem (comprising the answer that it is corresponding), calculates keyword similarity; Keyword duplicate removal similarity is exactly remove the impact of repetition keyword, and calculating separately both has the common keyword how much not repeated; Keyword discrate analog degree refers between the search content that user inputs and the information that preliminary candidate concentrates, whether keyword has identical distribution, it is be uniformly distributed, still somewhere is concentrated on, the information that the search content generally user inputted and preliminary candidate concentrate is cut into clause, calculating has how many clauses to comprise common keyword, as keyword discrate analog degree score. In addition, the problem of user there are many about products such as softwares, for the problem that different products proposes, should not be considered as similar problem, such as, if two problems contains identical product word, then product Word similarity is 1, and not containing like products word, then product Word similarity is 0. By technique scheme, semantic similarity and other one or more similarities with the use of, it is possible to more accurately recommend analog information, improve Consumer's Experience.

In above-mentioned any technology scheme, it may be preferred that before step 102, also comprise: remove the expression word in described keyword and stop-word.

Fig. 2 shows the block diagram of analog information commending system according to one embodiment of present invention.

As shown in Figure 2, analog information commending system 200 according to one embodiment of present invention, comprising: candidate collects determining unit 202, analog information determining unit 204, analog information display unit 206.

Wherein, candidate collects determining unit 202 for according to the keyword in search content, it is determined that preliminary candidate's collection; Analog information determining unit 204, for concentrating the semantic similarity of every bar information according to described search content and described preliminary candidate, is concentrated described preliminary candidate and is determined the analog information corresponding with described search content; Analog information display unit 206 is for showing described analog information.

In above-mentioned any technology scheme, it may be preferred that described analog information determining unit 204 comprises: vector training unit 2042 and Semantic Similarity Measurement unit 2044.

Wherein, vector training unit 2042 is for training described search content by language model and the unit semantic vector of information that described preliminary candidate concentrates, and wherein, described unit semantic vector is word vector or word vector; Semantic Similarity Measurement unit 2044 is for according to described unit semantic vector, calculate described search content and the semantic similarity of information that described preliminary candidate concentrates, wherein, described semantic similarity comprises: word vector cumulative sum, word vector cumulative sum, word vector mean value or word vector mean value; And described analog information display unit 206 specifically for: the information that described preliminary candidate concentrates is sorted and shows from height to low according to described semantic similarity.

In above-mentioned any technology scheme, preferably, also comprise: other similarity determining unit 208, before described preliminary candidate concentrates and determines the analog information corresponding with described search content, determine other similarities of the information that described search content and described preliminary candidate concentrate, wherein, other similarities described comprise following one or a combination set of: keyword similarity, keyword duplicate removal similarity, keyword discrate analog degree and product Word similarity; And described analog information determining unit 204 for: determine described analog information according to described semantic similarity and other similarities described.

In above-mentioned any technology scheme, it may be preferred that also comprise: removal unit 210, described according to the keyword in search content, it is determined that preliminary candidate removes the expression word in described keyword and stop-word before collecting.

Fig. 3 shows the schematic diagram of the frame carrying out analog information commending system according to an alternative embodiment of the invention.

As shown in Figure 3, when carrying out analog information recommendation, the keyword that first system uses user to put question to is searched in search system with crucial phrase, obtain preliminary candidate question set, calculated candidate concentrates the similarity that candidate's problem and user put question to again, according to sequencing of similarity, draws ranked candidate collection. Finally, ranked candidate collection is filtered and provide recommendation results.

Below the main characteristic sum design implementation of system is explained in detail.

The main feature of system comprises:

(1) quick recommendation is achieved.

(2) using multiple method to calculate the similarity of problem, measure from multiple angle, comprehensive multiple factor provides more effective recommendation results.

(3) support to dynamically update data, reach synchronous in real time with the system of posting.

The design implementation of system is as follows:

(1) the preliminary candidate's collection of search system rapid screening is used.

Because model quantity is at least 1,000,000 grades in Web Community, system can use search system to provide preliminary candidate to collect. Search system have recorded asked questions and answer and corresponding keyword and phrase.The keyword of search subscriber input in search system and phrase, can provide preliminary candidate question set. This candidate question set is N times of recommendation results collection, and N can be arranged as requested, can carry out preliminary screening so fast, meet the real-time demand of system. Simultaneously, it may also be useful to search system can also support the operations such as the increase at any time of model, deletion and amendment, reach synchronous in real time with the system of posting.

(2) measuring similarity is carried out.

Native system uses keyword similarity, keyword Jaccard (Jie Kade coefficient) similarity, keyword discrate analog degree, name of product similarity and semantic similarity as measure, finally the score that all measures obtain is multiplied by weight sum, it is exactly the final score of problem similarity. Sort according to similarity score, just obtain the candidate question set of sequence.

A. keyword similarity is calculated.

According to keyword shared weight in problem, and user inputs and the keyword coincidence degree of candidate's problem (comprising the answer that it is corresponding), calculates keyword similarity.

During the keyword similarity of calculating problem, first will by the keyword that comprises in problem and phrase extraction out. And the quality of keywords database is extremely important for the tolerance of similarity.

Keywords database has two sources, first, collect website model and form language material, language material is carried out participle, calculate TFIDF (information retrieval excavate the conventional weighting) value of word, it is sorted, choose and front N number of enter keywords database, period, it is necessary to this N number of word is removed stop-word and insignificant word that some are conventional. 2nd, the keyword collecting same area on network adds dictionary. The forming process in crucial phrase storehouse is also similar.

B. keyword Jaccard similarity is calculated.

Between the asked questions of user's input and candidate's problem, there is the keyword that some are common, and some keyword repeats in problem, keyword Jaccard similarity is exactly remove these impacts repeating keyword, and calculating separately both has the common keyword how much not repeated.

C. keyword discrate analog degree is calculated.

Keyword discrate analog degree refers between the search content that user inputs and the information that preliminary candidate concentrates, whether keyword has identical distribution, it is be uniformly distributed, still somewhere is concentrated on, the information that the search content generally user inputted and preliminary candidate concentrate is cut into clause, calculating has how many clauses to comprise common keyword, as keyword discrate analog degree score.

D. product Word similarity is calculated.

The problem of user there are many about products such as softwares, for the problem that different products proposes, it should not be considered as similar problem, such as, if two problems contains identical product word, then product Word similarity is 1, not containing like products word, then product Word similarity is 0.

E. computing semantic similarity.

The sentence of problem (or the sentence set of problem, also the answer that problem is corresponding is comprised, lower abbreviation sentence) less fundamental unit can be cut into, the semanteme that the set of semantics of native system fundamental unit is a problem, and then Utilizing question semantic vector, calculate the similarity of problem.

As shown in Figure 4, by language model training units semantic vector. In search content, the semantic fundamental unit of sentence has two kinds, one is word, and one is word, and word justice and word semanteme can be used for forming sentence semantics, if word is as fundamental unit, need sentence is carried out participle, it is divided into multiple semantic fundamental unit, with word as fundamental unit, need sentence word for word cutting, also being divided into multiple semantic fundamental unit, therefore, unit semantic vector is word vector or word vector.

For problem 1 and problem 2, it is possible to respectively by language model training units semantic vector, it is necessary to sentence is cut into multiple semantic fundamental unit, calculates semanteme, then calculate the semantic similarity with problem 2 semanteme of problem 1 further.

These two kinds of methods all need to train word occurrence semantic vector or word semantic vector according to language model with previously prepd word language material, language model is the probability model calculating a sentence, it is assumed based on Markov, that is, the appearance of next word only depends on one or several word before it. According to this principle, it is possible to use word language material trains word vector or word vector. Relation between the semantic vector trained like this, it is possible to directly embody from these two vectorial differences. The difference of vector is exactly definition mathematically, directly subtract each other by turn, such as, semantic " king "-semantic " queen " �� semanteme " man "-semantic " woman " is exactly semantic " queen " with semantic " king "-semantic " man "+semantic " woman " immediate vector.

Finally, it is possible to filter and provide recommendation results. If the problem having in Web Community is not answered, so recommended nonsensical yet, so when providing recommendation results, it does not have the problem of answer can be filtered. If the answer of problem obtains the recommendation of Web Community expert, illustrate that answer obtains the accreditation of expert, before such problem being come when providing recommendation results. Finally, sorted candidate concentrate, select before N number of as recommendation results in dedicating user to.

In addition because keywords database be the enquirement from user and answer extract, have some expression words be also counted into keywords database owing to TFIDF value is higher. When recommendation, recommendation results is had certain influence by expression word, and the such as enquirement of a user is containing espressiove, and the problem so containing identical expression word can enter preliminary candidate collection, and then enters recommendation set, and this is not inconsistent with recommendation expectation. So needing the expression word in keywords database to remove.

Equally, stop-word as conventional in " consulting ", " may I ask " etc. is also nonsensical, so this class word also needs to remove from keywords database.

Fig. 5 shows the schematic diagram at information recommendation interface according to one embodiment of present invention.

As shown in Figure 5, the information recommendation system of the present invention is applied in financial accounting class website " accounting home ", when user browses model in website, system can provide the recommendation of similar problem, user puts question to: " income tax season declaration form, operation revenue, running cost, how total profit fills in? " then system shows similar problem by the calculating of semantic similarity: " income tax season declaration form, does is running cost that operation revenue subtracts total profit? or the running cost according to profit statement ...? " and show the answer of this type of similar problem, thus, solve the problem of user.

Fig. 6 shows the schematic diagram at the information recommendation interface according to an alternative embodiment of the invention.

As shown in Figure 6, information recommendation interface according to an alternative embodiment of the invention, user puts question to: " having handed over income tax when making the final settlement today; return by the tax bureau afterwards; record separately and how to do? " more then system shows many similar problems and answer by the calculating of semantic similarity, correctly analyze the semanteme that user puts question to, improve the experience of user.

More than it is described with reference to the accompanying drawings the technical scheme of the present invention, by the technical scheme of the present invention, it is possible to more accurately provide the information similar to search content, improve the search efficiency of user, also avoid user to repeat to post, thus improve the experience of user.

The foregoing is only the preferred embodiments of the present invention, be not limited to the present invention, for a person skilled in the art, the present invention can have various modifications and variations. Within the spirit and principles in the present invention all, any amendment of doing, equivalent replacement, improvement etc., all should be included within protection scope of the present invention.

Claims

1. an analog information recommend method, it is characterised in that, comprising:

According to the keyword in search content, it is determined that preliminary candidate's collection;

Concentrate the semantic similarity of every bar information according to described search content and described preliminary candidate, concentrate described preliminary candidate and determine the analog information corresponding with described search content;

Show described analog information.

2. analog information recommend method according to claim 1, it is characterised in that, described search content comprises asked questions, the information that described preliminary candidate concentrates: existing asked questions and existing problem answers.

3. analog information recommend method according to claim 1 and 2, it is characterized in that, the described semantic similarity concentrating every bar information according to described search content and described preliminary candidate, concentrate described preliminary candidate and determine the analog information corresponding with described search content, comprising:

Training described search content by language model and the unit semantic vector of information that described preliminary candidate concentrates, wherein, described unit semantic vector is word vector or word vector;

According to described unit semantic vector, calculating described search content and the semantic similarity of information that described preliminary candidate concentrates, wherein, described semantic similarity comprises: word vector cumulative sum, word vector cumulative sum, word vector mean value or word vector mean value; And

The described analog information of described displaying, comprising:

The information that described preliminary candidate concentrates is sorted and shows from height to low according to described semantic similarity.

4. analog information recommend method according to claim 3, it is characterised in that, before described preliminary candidate concentrates and determines the analog information corresponding with described search content, also comprise:

Determining other similarities of the information that described search content and described preliminary candidate concentrate, wherein, other similarities described comprise following one or a combination set of: keyword similarity, keyword duplicate removal similarity, keyword discrate analog degree and product Word similarity; And

Concentrate described preliminary candidate and determine the analog information corresponding with described search content, specifically comprise:

Described analog information is determined according to described semantic similarity and other similarities described.

5. analog information recommend method according to claim 3, it is characterised in that, described according to the keyword in search content, it is determined that preliminary candidate also comprises before collecting:

Remove the expression word in described keyword and stop-word.

6. an analog information commending system, it is characterised in that, comprising:

Candidate collects determining unit, according to the keyword in search content, it is determined that preliminary candidate's collection;

Analog information determining unit, concentrates the semantic similarity of every bar information according to described search content and described preliminary candidate, concentrates described preliminary candidate and determines the analog information corresponding with described search content;

Analog information display unit, shows described analog information.

7. analog information commending system according to claim 6, it is characterised in that, described search content comprises asked questions, the information that described preliminary candidate concentrates: existing asked questions and existing problem answers.

8. analog information commending system according to claim 6 or 7, it is characterised in that, described analog information determining unit comprises:

Vector training unit, trains described search content by language model and the unit semantic vector of information that described preliminary candidate concentrates, and wherein, described unit semantic vector is word vector or word vector;

Semantic Similarity Measurement unit, according to described unit semantic vector, calculating described search content and the semantic similarity of information that described preliminary candidate concentrates, wherein, described semantic similarity comprises: word vector cumulative sum, word vector cumulative sum, word vector mean value or word vector mean value; And

Described analog information display unit specifically for:

9. analog information commending system according to claim 8, it is characterised in that, also comprise:

Other similarity determining unit, before described preliminary candidate concentrates and determines the analog information corresponding with described search content, determine other similarities of the information that described search content and described preliminary candidate concentrate, wherein, other similarities described comprise following one or a combination set of: keyword similarity, keyword duplicate removal similarity, keyword discrate analog degree and product Word similarity; And

Described analog information determining unit is used for:

10. analog information commending system according to claim 9, it is characterised in that, also comprise:

Removal unit, described according to the keyword in search content, it is determined that preliminary candidate removes the expression word in described keyword and stop-word before collecting.