CN104408036A - Correlated topic recognition method and device - Google Patents

Correlated topic recognition method and device Download PDF

Info

Publication number
CN104408036A
CN104408036A CN201410779602.1A CN201410779602A CN104408036A CN 104408036 A CN104408036 A CN 104408036A CN 201410779602 A CN201410779602 A CN 201410779602A CN 104408036 A CN104408036 A CN 104408036A
Authority
CN
China
Prior art keywords
described
topic
multidimensional numerical
corresponding
described target
Prior art date
Application number
CN201410779602.1A
Other languages
Chinese (zh)
Other versions
CN104408036B (en
Inventor
刘粉香
Original Assignee
北京国双科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京国双科技有限公司 filed Critical 北京国双科技有限公司
Priority to CN201410779602.1A priority Critical patent/CN104408036B/en
Publication of CN104408036A publication Critical patent/CN104408036A/en
Application granted granted Critical
Publication of CN104408036B publication Critical patent/CN104408036B/en

Links

Abstract

The invention discloses a correlated topic recognition method and device. The correlated topic recognition method comprises the steps of obtaining a target keyword; determining a multidimensional array corresponding to the target keyword, wherein each dimensionality number in the multidimensional array is used for representing one attribute of the target keyword; calculating the correlation index between the multidimensional array corresponding to the target keyword and multidimensional arrays corresponding to target topics, wherein the correlation index is used for representing the correlation between the target keyword and each target topic, the target topics are multiple pre-marked topics provided with the multidimensional arrays; determining the topics correlated with the target keyword according to the correlation index obtained through calculation. By means of the correlated topic recognition method and device, the problem of low topic recognition accuracy in the prior art is solved, and the effect of improving the topic recognition accuracy is achieved.

Description

The recognition methods of association topic and device

Technical field

The present invention relates to topic detection field, in particular to a kind of recognition methods and the device that associate topic.

Background technology

Topic detection mainly refers to, according to the keyword provided, from a large amount of text, identifies the topic relevant to given keyword, as: how given keyword " college entrance examination ", identify topic associated in text.Here topic can refer to the topic on internet, such as news topic, microblog topic etc., mainly embodies in a text form.

At present, for topic detection, mainly given keyword is mated with the topic in text, if there is given keyword in topic, then think that topic is relevant to keyword.But, because the dirigibility of language is comparatively strong, there will be such situation: topic and given keyword relevance higher, but there is not this keyword in topic, adopt above-mentioned matching way then accurately cannot surely identify the topic relevant to keyword.

For the problem that the accuracy of topic detection in prior art is low, at present effective solution is not yet proposed.

Summary of the invention

Fundamental purpose of the present invention is to provide a kind of recognition methods and the device that associate topic, to solve the problem that in prior art, the accuracy of topic detection is low.

To achieve these goals, according to an aspect of the embodiment of the present invention, a kind of recognition methods associating topic is provided.Recognition methods according to association topic of the present invention comprises: obtain target keyword; Determine with machine learning method the Multidimensional numerical that described target keyword is corresponding, wherein, in described Multidimensional numerical, each dimension numeral is for representing an attribute of described target keyword; Calculate the correlation index between Multidimensional numerical corresponding to the described target keyword Multidimensional numerical corresponding with target topic, wherein, described correlation index is for representing described target keyword and the relevance described in each between target topic, and described target topic is the multiple topics with Multidimensional numerical marked in advance; And determine and the topic that described target keyword is associated according to the correlation index calculated.

Further, the correlation index calculated between Multidimensional numerical corresponding to the described target keyword Multidimensional numerical corresponding with target topic comprises: calculate the Euclidean distance between Multidimensional numerical corresponding to the described target keyword Multidimensional numerical corresponding with described target topic, using described Euclidean distance as described correlation index, wherein, the relevance between the described target keyword of the less expression of the Euclidean distance between described target keyword and topic and described topic is higher.

Further, the correlation index calculated between Multidimensional numerical corresponding to the described target keyword Multidimensional numerical corresponding with described target topic comprises: obtain the Multidimensional numerical that described target topic is corresponding; Correlation index between the Multidimensional numerical that described in the Multidimensional numerical that the described target keyword of direct calculating is corresponding, target topic is corresponding, or, obtain the Multidimensional numerical that in described target topic, each word is corresponding; Calculate the correlation index between Multidimensional numerical corresponding to the described target keyword Multidimensional numerical corresponding with each word in described target topic; Correlation index between the Multidimensional numerical corresponding with described each word by the Multidimensional numerical that described target keyword is corresponding calculates the correlation index between Multidimensional numerical corresponding to the described target keyword Multidimensional numerical corresponding with described target topic.

Further, determine according to the correlation index calculated and to comprise with the topic of described target critical word association: it is pre-conditioned whether the correlation index calculated described in judgement meets; If the correlation index calculated described in judging meets described pre-conditioned, then the correlation index calculated described in determining meets described pre-conditioned target topic and is associated with described target keyword; If the correlation index calculated described in judging does not meet described pre-conditioned, then it is uncorrelated with described target keyword that the correlation index calculated described in determining does not meet described pre-conditioned target topic.

Further, before acquisition target keyword, described recognition methods also comprises: obtain target text, include described target topic in described target text; Utilize participle instrument to carry out participle to described target text, and mark the part of speech of each word in described target text; Determine described target topic according to the part-of-speech rule model set up in advance according to the part of speech of the word after participle, and described target topic is marked; And according to Multidimensional numerical corresponding to each word after machine learning method determination participle and Multidimensional numerical corresponding to described target topic.

To achieve these goals, according to the another aspect of the embodiment of the present invention, provide a kind of recognition device associating topic.Recognition device according to association topic of the present invention comprises: the first acquiring unit, for obtaining target keyword; First determining unit, for determining the Multidimensional numerical that described target keyword is corresponding according to machine learning method, wherein, in described Multidimensional numerical, each dimension numeral is for representing an attribute of described target keyword; Computing unit, for calculating the correlation index between Multidimensional numerical corresponding to the described target keyword Multidimensional numerical corresponding with target topic, wherein, described correlation index is for representing described target keyword and the relevance described in each between target topic, and described target topic is the multiple topics with Multidimensional numerical marked in advance; And second determining unit, for determining and the topic that described target keyword is associated according to the correlation index calculated.

Further, described computing unit comprises: the first computing module, for calculating the Euclidean distance between Multidimensional numerical corresponding to the described target keyword Multidimensional numerical corresponding with described target topic, using described Euclidean distance as described correlation index, wherein, the relevance between the described target keyword of the less expression of the Euclidean distance between described target keyword and topic and described topic is higher.

Further, described computing unit comprises: the first acquisition module, for obtaining Multidimensional numerical corresponding to described target topic; Second computing module, for directly calculating the correlation index between Multidimensional numerical that described in Multidimensional numerical corresponding to described target keyword, target topic is corresponding, or described computing unit comprises: the second acquisition module, for obtaining the Multidimensional numerical that in described target topic, each word is corresponding; 3rd computing module, for calculating the correlation index between Multidimensional numerical corresponding to the described target keyword Multidimensional numerical corresponding with each word in described target topic; 4th computing module, calculates the correlation index between Multidimensional numerical corresponding to the described target keyword Multidimensional numerical corresponding with described target topic for the correlation index between the Multidimensional numerical that the Multidimensional numerical corresponding by described target keyword is corresponding with described each word.

Whether further, described second determining unit comprises: judge module, meet pre-conditioned for the correlation index calculated described in judging; Determination module, if meet described pre-conditioned for the correlation index calculated described in judging, then the correlation index calculated described in determining meets described pre-conditioned target topic and is associated with described target keyword; If the correlation index calculated described in judging does not meet described pre-conditioned, then it is uncorrelated with described target keyword that the correlation index calculated described in determining does not meet described pre-conditioned target topic.

Further, described recognition device also comprises: second acquisition unit, for before acquisition target keyword, obtains target text, includes described target topic in described target text; Participle unit, for utilizing participle instrument to carry out participle to described target text, and marks the part of speech of each word in described target text; 3rd determining unit, for determining described target topic according to the part-of-speech rule model set up in advance according to the part of speech of the word after participle, and marks described target topic; And the 4th determining unit, for determining the Multidimensional numerical that each word after participle is corresponding and Multidimensional numerical corresponding to described target topic.

In the embodiment of the present invention, by obtaining target keyword, determine the Multidimensional numerical that target keyword is corresponding, calculate the correlation index between Multidimensional numerical corresponding to the target keyword Multidimensional numerical corresponding with target topic, the topic be associated with target keyword is determined according to the correlation index calculated, correlativity between target keyword and target topic is judged the Multidimensional numerical be converted into for representing target keyword attribute and the calculating being used for the correlation index represented between the Multidimensional numerical of target topic attribute, avoid owing to not occurring in topic that keyword causes the mode of Keywords matching cannot identify the problem of topic exactly, solve the problem that in prior art, the accuracy of topic detection is low, reach the effect of the accuracy improving topic detection.

Accompanying drawing explanation

The accompanying drawing forming a application's part is used to provide a further understanding of the present invention, and schematic description and description of the present invention, for explaining the present invention, does not form inappropriate limitation of the present invention.In the accompanying drawings:

Fig. 1 is the process flow diagram of the recognition methods of association topic according to the embodiment of the present invention; And

Fig. 2 is the schematic diagram of the recognition device of association topic according to the embodiment of the present invention.

Embodiment

It should be noted that, when not conflicting, the embodiment in the application and the feature in embodiment can combine mutually.Below with reference to the accompanying drawings and describe the present invention in detail in conjunction with the embodiments.

The present invention program is understood better in order to make those skilled in the art person, below in conjunction with the accompanying drawing in the embodiment of the present invention, technical scheme in the embodiment of the present invention is clearly and completely described, obviously, described embodiment is only the embodiment of a part of the present invention, instead of whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art, not making the every other embodiment obtained under creative work prerequisite, should belong to the scope of protection of the invention.

It should be noted that, term " first ", " second " etc. in instructions of the present invention and claims and above-mentioned accompanying drawing are for distinguishing similar object, and need not be used for describing specific order or precedence.Should be appreciated that the data used like this can be exchanged, in the appropriate case so that embodiments of the invention described herein.In addition, term " comprises " and " having " and their any distortion, intention is to cover not exclusive comprising, such as, contain those steps or unit that the process of series of steps or unit, method, system, product or equipment is not necessarily limited to clearly list, but can comprise clearly do not list or for intrinsic other step of these processes, method, product or equipment or unit.

Embodiments provide a kind of recognition methods associating topic.

Fig. 1 is the process flow diagram of the recognition methods of association topic according to the embodiment of the present invention.As shown in Figure 1, to comprise step as follows in the recognition methods of this association topic:

Step S102, obtains target keyword.

Target keyword can be one or more, such as:, college entrance examination etc. in 2014.

Step S104, by the Multidimensional numerical that machine learning method determination target keyword is corresponding.Wherein, in Multidimensional numerical each dimension array for representing an attribute of target keyword.

Because each dimension numeral in Multidimensional numerical is for representing an attribute of target keyword, then the Multidimensional numerical that target keyword is corresponding unique, that is to say and represent target keyword by Multidimensional numerical.Such as, for 500 dimension array representation target keyword, after getting target keyword, 500 unique dimension groups corresponding to target keyword are obtained by machine learning method.

Step S106, calculates the correlation index between Multidimensional numerical corresponding to the target keyword Multidimensional numerical corresponding with target topic.Wherein, correlation index is for representing the relevance between target keyword and each target topic, and target topic is the multiple topics with Multidimensional numerical marked in advance.

Step S108, determines the topic be associated with target keyword according to the correlation index calculated.

Each target topic Multidimensional numerical that also correspondence one is unique, that is to say and the unique Multidimensional numerical of each topic represented, wherein, in this Multidimensional numerical, each dimension word represents an attribute in target topic.It should be noted that, for representing the number of dimensions of the Multidimensional numerical of topic and the number of dimensions identical (hereafter in like manner) representing target keyword, makeing mistakes to avoid calculating.

After determining the Multidimensional numerical that target keyword is corresponding, calculate the correlation index between this Multidimensional numerical Multidimensional numerical corresponding with target topic.Due to multiple topic can be there is in text, then calculate the correlation index between Multidimensional numerical corresponding to the target keyword multidimensional data corresponding respectively with multiple topic respectively, thus obtain the relevance between target keyword and multiple topic.Finally, the topic be associated for target keyword is determined according to the correlation index calculated, particularly, corresponding threshold value can be set, when correlation index exceedes this threshold value, then think that this target topic is associated with target keyword, otherwise, then think uncorrelated.

In the embodiment of the present invention, by obtaining target keyword, determine the Multidimensional numerical that target keyword is corresponding, calculate the correlation index between Multidimensional numerical corresponding to the target keyword Multidimensional numerical corresponding with target topic, the topic be associated with target keyword is determined according to the correlation index calculated, correlativity between target keyword and target topic is judged the Multidimensional numerical be converted into for representing target keyword attribute and the calculating being used for the correlation index represented between the Multidimensional numerical of target topic attribute, avoid owing to not occurring in topic that keyword causes the mode of Keywords matching cannot identify the problem of topic exactly, solve the problem that in prior art, the accuracy of topic detection is low, reach the effect of the accuracy improving topic detection.

In the embodiment of the present invention, after determining the topic be associated for target keyword, can sort to topic according to correlativity, such as, if the correlation index calculated is larger, then show that the relevance between target keyword and topic is higher, then descending according to the correlation index calculated target topic can be sorted, thus obtain topic attention rate sequencing table.If the correlation index calculated is less, then show that the relevance between target keyword and topic is higher, then can sort to target topic according to correlation index is ascending.

Alternatively, the correlation index calculated between Multidimensional numerical corresponding to the target keyword Multidimensional numerical corresponding with target topic comprises: calculate the Euclidean distance between Multidimensional numerical corresponding to the target keyword Multidimensional numerical corresponding with target topic, using Euclidean distance as correlation index.

In the embodiment of the present invention, represent the relevance between target keyword and topic by the Euclidean distance between array, wherein, the relevance between the less expression target keyword of the Euclidean distance between target keyword and topic and topic is higher; Relevance between Euclidean distance larger expression target keyword and topic is more high lower.Like this, when topic being sorted according to the relevance height between target topic and keyword, in the present embodiment, then according to Euclidean distance is ascending, target topic is sorted, obtain attention rate sequencing table.

In the embodiment of the present invention, the employing Euclidean distance calculated between array judges the relevance between target keyword and target topic, improves the speed of topic detection.

Alternatively, the correlation index calculated between Multidimensional numerical corresponding to the target keyword Multidimensional numerical corresponding with target topic comprises: obtain the Multidimensional numerical that in target topic, each word is corresponding; Calculate the correlation index between Multidimensional numerical corresponding to the target keyword Multidimensional numerical corresponding with each word in target topic; Because the correlation index between the Multidimensional numerical that the Multidimensional numerical that target keyword is corresponding is corresponding with each word calculates the correlation index between Multidimensional numerical corresponding to the target keyword Multidimensional numerical corresponding with target topic.

Because topic is made up of according to certain grammer word, multiple word is included in topic, when calculating Multidimensional numerical corresponding to target keyword and Multidimensional numerical corresponding to target topic with machine learning method, first to calculate the Multidimensional numerical of each word in target topic, correlation index between the Multidimensional numerical corresponding with target topic can be calculate the correlation index between Multidimensional numerical and the array corresponding with target keyword that in target topic, each word is corresponding respectively, then is obtained the relevance of target keyword and target topic by this correlation index.Such as, calculate the Euclidean distance between the Multidimensional numerical that in target topic, each word is corresponding and the array corresponding with target keyword respectively, calculate the correlation index between Multidimensional numerical corresponding to the target keyword Multidimensional numerical corresponding with target topic by this Euclidean distance.Like this, by determining the relevance between target topic and target keyword to the relevance between word each in topic and target keyword, improve the accuracy that the corresponding array of topic calculates further, and then ensure the accuracy of topic detection.

Alternatively, the correlation index calculated between Multidimensional numerical corresponding to the target keyword Multidimensional numerical corresponding with target topic comprises: obtain the Multidimensional numerical that target topic is corresponding; Correlation index between the Multidimensional numerical that Multidimensional numerical target topic corresponding to direct calculating target keyword is corresponding.

Because topic is made up of multiple word, Multidimensional numerical that can be first corresponding according to word each in topic obtains Multidimensional numerical corresponding to topic by machine learning.So, during compute associations index, target topic can be obtained in advance by unique Multidimensional numerical that machine learning obtains, then directly calculate the correlation index between Multidimensional numerical corresponding to the target keyword Multidimensional numerical corresponding with target topic.With respect to the correlation index calculating each word in target keyword and topic, the speed that the correlation index substantially increasing target keyword and topic calculates.

Preferably, determine according to the correlation index calculated and to comprise with the topic of target critical word association: judge whether the correlation index calculated meets pre-conditioned; If it is pre-conditioned to judge that the correlation index calculated meets, then determine that the correlation index calculated meets pre-conditioned target topic and is associated with target keyword; If judge that the correlation index calculated does not meet pre-conditioned, then determine that the correlation index calculated does not meet pre-conditioned target topic uncorrelated with target keyword.

In the present embodiment, pre-conditioned can be predetermined threshold value, such as, when correlation index is larger, then show target topic with between target keyword to associate performance higher, so, judging whether the correlation index calculated meets pre-conditioned can be judge whether the correlation index calculated exceedes predetermined threshold value, if exceeded, then determines that topic is associated with target keyword, otherwise, then uncorrelated.

If correlation index is the Euclidean distance between array, then judging whether the correlation index calculated meets pre-conditioned can be judge whether Euclidean distance is less than predetermined threshold value, if so, then determines that topic is associated with target keyword, otherwise, then uncorrelated.

Pre-conditioned by arranging, from the result calculated, determine the topic relevant to target keyword rapidly, thus improve the accuracy of topic detection.

Preferably, before acquisition target keyword, recognition methods also comprises: obtain target text, include target topic in target text; Utilize participle instrument to carry out participle to target text, and mark the part of speech of each word in target text; Determine target topic according to the part-of-speech rule model set up in advance according to the part of speech of the word after participle, and target topic is marked; Determine the Multidimensional numerical that each word after participle is corresponding and Multidimensional numerical corresponding to target topic.

Obtain the target text including topic, set up text training set, and set text word segmentation regulation as required; With the part-of-speech rule model (as noun+verb, or noun+verb+object) of semantic analysis structure topic; Utilize participle instrument (including the text word segmentation regulation of setting) to carry out text analyzing, and mark all parts of speech of each word, mark topic simultaneously; Represent all words (comprising topic) respectively by Multidimensional numerical, such as 500 dimensions, obtain the unique Multidimensional numerical of correspondence of each word by machine learning method.Like this, after getting target keyword and determine the Multidimensional numerical of target keyword, Multidimensional numerical that can be directly corresponding with topic calculates correlation index such as Euclidean distance.

In the embodiment of the present invention, by part-of-speech rule model definition topic, obtain each word and array corresponding to topic with machine learning method, make topic relevance judge to convert to the calculating of correlation index between array, improve speed and the accuracy of associated topic identification greatly.

The embodiment of the present invention additionally provides a kind of recognition device associating topic.This device can realize its function by computer equipment.It should be noted that, the recognition device of the association topic of the embodiment of the present invention may be used for the recognition methods performing the association topic that the embodiment of the present invention provides, and the recognition device of the association topic that the recognition methods of the association topic of the embodiment of the present invention also can be provided by the embodiment of the present invention performs.

Fig. 2 is the schematic diagram of the recognition device of association topic according to the embodiment of the present invention.As shown in Figure 2, the recognition device of this association topic comprises: the first acquiring unit 10, first determining unit 20, computing unit 30 and the second determining unit 40.

First acquiring unit 10 is for obtaining target keyword.

Target keyword can be one or more, such as:, college entrance examination etc. in 2014.

First determining unit 20 is for by Multidimensional numerical corresponding to machine learning method determination target keyword, and wherein, in Multidimensional numerical, each dimension numeral is for representing an attribute of target keyword.

Because each dimension numeral in Multidimensional numerical is for representing an attribute of target keyword, then the Multidimensional numerical that target keyword is corresponding unique, that is to say and represent target keyword by Multidimensional numerical.Such as, for 500 dimension array representation target keyword, after getting target keyword, 500 unique dimension groups corresponding to target keyword can be obtained by machine learning method.

Computing unit 30 is for calculating the correlation index between Multidimensional numerical corresponding to the target keyword Multidimensional numerical corresponding with target topic, wherein, correlation index is for representing the relevance between target keyword and target topic, and target topic is the multiple topics with Multidimensional numerical marked in advance.

Second determining unit 40 is for determining according to the correlation index calculated the topic be associated with target keyword.

Each target topic Multidimensional numerical that also correspondence one is unique, that is to say and the unique Multidimensional numerical of each topic represented, wherein, in this Multidimensional numerical, each dimension word represents an attribute in target topic.It should be noted that, for representing the number of dimensions of the Multidimensional numerical of topic and the number of dimensions identical (hereafter in like manner) representing target keyword, makeing mistakes to avoid calculating.

After determining the Multidimensional numerical that target keyword is corresponding, calculate the correlation index between this Multidimensional numerical Multidimensional numerical corresponding with target topic.Due to multiple topic can be there is in text, then calculate the correlation index between Multidimensional numerical corresponding to the target keyword multidimensional data corresponding respectively with multiple topic respectively, thus obtain the relevance between target keyword and multiple topic.Finally, the topic be associated for target keyword is determined according to the correlation index calculated, particularly, corresponding threshold value can be set, when correlation index exceedes this threshold value, then think that this target topic is associated with target keyword, otherwise, then think uncorrelated.

In the embodiment of the present invention, by obtaining target keyword, determine the Multidimensional numerical that target keyword is corresponding, calculate the correlation index between Multidimensional numerical corresponding to the target keyword Multidimensional numerical corresponding with target topic, the topic be associated with target keyword is determined according to the correlation index calculated, correlativity between target keyword and target topic is judged the Multidimensional numerical be converted into for representing target keyword attribute and the calculating being used for the correlation index represented between the Multidimensional numerical of target topic attribute, avoid owing to not occurring in topic that keyword causes the mode of Keywords matching cannot identify the problem of topic exactly, solve the problem that in prior art, the accuracy of topic detection is low, reach the effect of the accuracy improving topic detection.

In the embodiment of the present invention, after determining the topic be associated for target keyword, can sort to topic according to correlativity, such as, if the correlation index calculated is larger, then show that the relevance between target keyword and topic is higher, then descending according to the correlation index calculated target topic can be sorted, thus obtain topic attention rate sequencing table.If the correlation index calculated is less, then show that the relevance between target keyword and topic is higher, then can sort to target topic according to correlation index is ascending.

Preferably, computing unit comprises: the first computing module, for calculating the Euclidean distance between Multidimensional numerical corresponding to the target keyword Multidimensional numerical corresponding with target topic, using Euclidean distance as correlation index.

In the embodiment of the present invention, represent the relevance between target keyword and topic by the Euclidean distance between array, wherein, the relevance between the less expression target keyword of the Euclidean distance between target keyword and topic and topic is higher; Relevance between Euclidean distance larger expression target keyword and topic is more high lower.Like this, when topic being sorted according to the relevance height between target topic and keyword, in the present embodiment, then according to Euclidean distance is ascending, target topic is sorted, obtain attention rate sequencing table.

In the embodiment of the present invention, the employing Euclidean distance calculated between array judges the relevance between target keyword and target topic, improves the speed of topic detection.

Preferably, computing unit comprises: the second acquisition module, for obtaining the Multidimensional numerical that in target topic, each word is corresponding; 3rd computing module, for calculating the correlation index between Multidimensional numerical corresponding to the target keyword Multidimensional numerical corresponding with each word in target topic; 4th computing module, calculates the correlation index between Multidimensional numerical corresponding to the target keyword Multidimensional numerical corresponding with target topic for the correlation index between the Multidimensional numerical that the Multidimensional numerical corresponding by target keyword is corresponding with each word.

Because topic is made up of according to certain grammer word, multiple word is included in topic, when calculating Multidimensional numerical corresponding to target keyword and Multidimensional numerical corresponding to target topic with machine learning method, first to calculate the Multidimensional numerical of each word in target topic, correlation index between the Multidimensional numerical corresponding with target topic can be calculate the correlation index between Multidimensional numerical and the array corresponding with target keyword that in target topic, each word is corresponding respectively, then is obtained the relevance of target keyword and target topic by this correlation index.Such as, calculate the Euclidean distance between the Multidimensional numerical that in target topic, each word is corresponding and the array corresponding with target keyword respectively, calculate the correlation index between Multidimensional numerical corresponding to the target keyword Multidimensional numerical corresponding with target topic by this Euclidean distance.Like this, by determining the relevance between target topic and target keyword to the relevance between word each in topic and target keyword, improve the accuracy that the corresponding array of topic calculates further, and then ensure the accuracy of topic detection.

Alternatively, computing unit comprises: the first acquisition module, for obtaining Multidimensional numerical corresponding to target topic; Second computing module, for directly calculating the correlation index between Multidimensional numerical corresponding to Multidimensional numerical target topic corresponding to target keyword.

Because topic is made up of multiple word, Multidimensional numerical that can be first corresponding according to word each in topic obtains Multidimensional numerical corresponding to topic by machine learning.So, during compute associations index, target topic can be obtained in advance by unique Multidimensional numerical that machine learning obtains, then directly calculate the correlation index between Multidimensional numerical corresponding to the target keyword Multidimensional numerical corresponding with target topic.With respect to the correlation index calculating each word in target keyword and topic, the speed that the correlation index substantially increasing target keyword and topic calculates.

Preferably, the second determining unit comprises: judge module, for judging whether the correlation index calculated meets pre-conditioned; Determination module, if pre-conditioned for judging that the correlation index calculated meets, then determines that the correlation index calculated meets pre-conditioned target topic and is associated with target keyword; If judge that the correlation index calculated does not meet pre-conditioned, then determine that the correlation index calculated does not meet pre-conditioned target topic uncorrelated with target keyword.

In the present embodiment, pre-conditioned can be predetermined threshold value, such as, when correlation index is larger, then show target topic with between target keyword to associate performance higher, so, judging whether the correlation index calculated meets pre-conditioned can be judge whether the correlation index calculated exceedes predetermined threshold value, if exceeded, then determines that topic is associated with target keyword, otherwise, then uncorrelated.

If correlation index is the Euclidean distance between array, then judging whether the correlation index calculated meets pre-conditioned can be judge whether Euclidean distance is less than predetermined threshold value, if so, then determines that topic is associated with target keyword, otherwise, then uncorrelated.

Pre-conditioned by arranging, from the result calculated, determine the topic relevant to target keyword rapidly, thus improve the accuracy of topic detection.

Preferably, recognition device also comprises: second acquisition unit, for before acquisition target keyword, obtains target text, includes target topic in target text; Participle unit, for utilizing participle instrument to carry out participle to target text, and marks the part of speech of each word in target text; 3rd determining unit, for determining target topic according to the part-of-speech rule model set up in advance according to the part of speech of the word after participle, and marks target topic; And the 4th determining unit, for determining the Multidimensional numerical that each word after participle is corresponding and Multidimensional numerical corresponding to target topic.

Obtain the target text including topic, set up text training set, and set text word segmentation regulation as required; With the part-of-speech rule model (as noun+verb, or noun+verb+object) of semantic analysis structure topic; Utilize participle instrument (including the text word segmentation regulation of setting) to carry out text analyzing, and mark all parts of speech of each word, mark topic simultaneously; Represent all words (comprising topic) respectively by Multidimensional numerical, such as 500 dimensions, obtain the unique Multidimensional numerical of correspondence of each word by machine learning method.Like this, after getting target keyword and determine the Multidimensional numerical of target keyword, Multidimensional numerical that can be directly corresponding with topic calculates correlation index such as Euclidean distance.

In the embodiment of the present invention, by part-of-speech rule model definition topic, obtain each word and array corresponding to topic with machine learning method, make topic relevance judge to convert to the calculating of correlation index between array, improve speed and the accuracy of associated topic identification greatly.

It should be noted that, for aforesaid each embodiment of the method, in order to simple description, therefore it is all expressed as a series of combination of actions, but those skilled in the art should know, the present invention is not by the restriction of described sequence of movement, because according to the present invention, some step can adopt other orders or carry out simultaneously.Secondly, those skilled in the art also should know, the embodiment described in instructions all belongs to preferred embodiment, and involved action and module might not be that the present invention is necessary.

In the above-described embodiments, the description of each embodiment is all emphasized particularly on different fields, in certain embodiment, there is no the part described in detail, can see the associated description of other embodiments.

In several embodiments that the application provides, should be understood that, disclosed device, the mode by other realizes.Such as, device embodiment described above is only schematic, the such as division of described unit, be only a kind of logic function to divide, actual can have other dividing mode when realizing, such as multiple unit or assembly can in conjunction with or another system can be integrated into, or some features can be ignored, or do not perform.Another point, shown or discussed coupling each other or direct-coupling or communication connection can be by some interfaces, and the indirect coupling of device or unit or communication connection can be electrical or other form.

The described unit illustrated as separating component or can may not be and physically separates, and the parts as unit display can be or may not be physical location, namely can be positioned at a place, or also can be distributed in multiple network element.Some or all of unit wherein can be selected according to the actual needs to realize the object of the present embodiment scheme.

In addition, each functional unit in each embodiment of the present invention can be integrated in a processing unit, also can be that the independent physics of unit exists, also can two or more unit in a unit integrated.Above-mentioned integrated unit both can adopt the form of hardware to realize, and the form of SFU software functional unit also can be adopted to realize.

If described integrated unit using the form of SFU software functional unit realize and as independently production marketing or use time, can be stored in a computer read/write memory medium.Based on such understanding, the part that technical scheme of the present invention contributes to prior art in essence in other words or all or part of of this technical scheme can embody with the form of software product, this computer software product is stored in a storage medium, comprises all or part of step of some instructions in order to make a computer equipment (can be personal computer, mobile terminal, server or the network equipment etc.) perform method described in each embodiment of the present invention.And aforesaid storage medium comprises: USB flash disk, ROM (read-only memory) (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), portable hard drive, magnetic disc or CD etc. various can be program code stored medium.

The foregoing is only the preferred embodiments of the present invention, be not limited to the present invention, for a person skilled in the art, the present invention can have various modifications and variations.Within the spirit and principles in the present invention all, any amendment done, equivalent replacement, improvement etc., all should be included within protection scope of the present invention.

Claims (10)

1. associate a recognition methods for topic, it is characterized in that, comprising:
Obtain target keyword;
Determine with machine learning method the Multidimensional numerical that described target keyword is corresponding, wherein, in described Multidimensional numerical, each dimension numeral is for representing an attribute of described target keyword;
Calculate the correlation index between Multidimensional numerical corresponding to the described target keyword Multidimensional numerical corresponding with target topic, wherein, described correlation index is for representing described target keyword and the relevance described in each between target topic, and described target topic is the multiple topics with Multidimensional numerical marked in advance; And
Determine and the topic that described target keyword is associated according to the correlation index calculated.
2. recognition methods according to claim 1, is characterized in that, the correlation index calculated between Multidimensional numerical corresponding to the described target keyword Multidimensional numerical corresponding with target topic comprises:
Calculate the Euclidean distance between Multidimensional numerical corresponding to the described target keyword Multidimensional numerical corresponding with described target topic, using described Euclidean distance as described correlation index, wherein, the relevance between the described target keyword of the less expression of the Euclidean distance between described target keyword and topic and described topic is higher.
3. recognition methods according to claim 1, is characterized in that, the correlation index calculated between Multidimensional numerical corresponding to the described target keyword Multidimensional numerical corresponding with described target topic comprises:
Obtain the Multidimensional numerical that described target topic is corresponding; Correlation index between the Multidimensional numerical that described in the Multidimensional numerical that the described target keyword of direct calculating is corresponding, target topic is corresponding,
Or,
Obtain the Multidimensional numerical that in described target topic, each word is corresponding; Calculate the correlation index between Multidimensional numerical corresponding to the described target keyword Multidimensional numerical corresponding with each word in described target topic; Correlation index between the Multidimensional numerical corresponding with described each word by the Multidimensional numerical that described target keyword is corresponding calculates the correlation index between Multidimensional numerical corresponding to the described target keyword Multidimensional numerical corresponding with described target topic.
4. recognition methods according to claim 1, is characterized in that, determines to comprise with the topic of described target critical word association according to the correlation index calculated:
It is pre-conditioned whether the correlation index calculated described in judgement meets;
If the correlation index calculated described in judging meets described pre-conditioned, then the correlation index calculated described in determining meets described pre-conditioned target topic and is associated with described target keyword;
If the correlation index calculated described in judging does not meet described pre-conditioned, then it is uncorrelated with described target keyword that the correlation index calculated described in determining does not meet described pre-conditioned target topic.
5. recognition methods according to claim 1, is characterized in that, before acquisition target keyword, described recognition methods also comprises:
Obtain target text, in described target text, include described target topic;
Utilize participle instrument to carry out participle to described target text, and mark the part of speech of each word in described target text;
Determine described target topic according to the part-of-speech rule model set up in advance according to the part of speech of the word after participle, and described target topic is marked; And
Determine the Multidimensional numerical that each word after participle is corresponding and Multidimensional numerical corresponding to described target topic.
6. associate a recognition device for topic, it is characterized in that, comprising:
First acquiring unit, for obtaining target keyword;
First determining unit, for determining the Multidimensional numerical that described target keyword is corresponding with machine learning method, wherein, in described Multidimensional numerical, each dimension numeral is for representing an attribute of described target keyword;
Computing unit, for calculating the correlation index between Multidimensional numerical corresponding to the described target keyword Multidimensional numerical corresponding with target topic, wherein, described correlation index is for representing described target keyword and the relevance described in each between target topic, and described target topic is the multiple topics with Multidimensional numerical marked in advance; And
Second determining unit, for determining and the topic that described target keyword is associated according to the correlation index calculated.
7. recognition device according to claim 6, is characterized in that, described computing unit comprises:
First computing module, for calculating the Euclidean distance between Multidimensional numerical corresponding to the described target keyword Multidimensional numerical corresponding with described target topic, using described Euclidean distance as described correlation index, wherein, the relevance between the described target keyword of the less expression of the Euclidean distance between described target keyword and topic and described topic is higher.
8. recognition device according to claim 6, is characterized in that, described computing unit comprises:
First acquisition module, for obtaining Multidimensional numerical corresponding to described target topic; Second computing module, for directly calculating the correlation index between Multidimensional numerical that described in Multidimensional numerical corresponding to described target keyword, target topic is corresponding,
Or described computing unit comprises:
Second acquisition module, for obtaining the Multidimensional numerical that in described target topic, each word is corresponding; 3rd computing module, for calculating the correlation index between Multidimensional numerical corresponding to the described target keyword Multidimensional numerical corresponding with each word in described target topic; 4th computing module, calculates the correlation index between Multidimensional numerical corresponding to the described target keyword Multidimensional numerical corresponding with described target topic for the correlation index between the Multidimensional numerical that the Multidimensional numerical corresponding by described target keyword is corresponding with described each word.
9. recognition device according to claim 6, is characterized in that, described second determining unit comprises:
Whether judge module, meet pre-conditioned for the correlation index calculated described in judging;
Determination module, if meet described pre-conditioned for the correlation index calculated described in judging, then the correlation index calculated described in determining meets described pre-conditioned target topic and is associated with described target keyword; If the correlation index calculated described in judging does not meet described pre-conditioned, then it is uncorrelated with described target keyword that the correlation index calculated described in determining does not meet described pre-conditioned target topic.
10. recognition device according to claim 6, is characterized in that, described recognition device also comprises:
Second acquisition unit, for before acquisition target keyword, obtains target text, includes described target topic in described target text;
Participle unit, for utilizing participle instrument to carry out participle to described target text, and marks the part of speech of each word in described target text;
3rd determining unit, for determining described target topic according to the part-of-speech rule model set up in advance according to the part of speech of the word after participle, and marks described target topic; And
4th determining unit, for determining the Multidimensional numerical that each word after participle is corresponding and Multidimensional numerical corresponding to described target topic.
CN201410779602.1A 2014-12-15 2014-12-15 It is associated with recognition methods and the device of topic CN104408036B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410779602.1A CN104408036B (en) 2014-12-15 2014-12-15 It is associated with recognition methods and the device of topic

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410779602.1A CN104408036B (en) 2014-12-15 2014-12-15 It is associated with recognition methods and the device of topic

Publications (2)

Publication Number Publication Date
CN104408036A true CN104408036A (en) 2015-03-11
CN104408036B CN104408036B (en) 2019-01-08

Family

ID=52645668

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410779602.1A CN104408036B (en) 2014-12-15 2014-12-15 It is associated with recognition methods and the device of topic

Country Status (1)

Country Link
CN (1) CN104408036B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106326392A (en) * 2016-08-17 2017-01-11 合网络技术(北京)有限公司 Participating method and participating device for multimedia resource topic
CN107545039A (en) * 2017-07-31 2018-01-05 腾讯科技(深圳)有限公司 The index acquisition methods and device of keyword, computer equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6741959B1 (en) * 1999-11-02 2004-05-25 Sap Aktiengesellschaft System and method to retrieving information with natural language queries
CN102063469A (en) * 2010-12-03 2011-05-18 百度在线网络技术(北京)有限公司 Method and device for acquiring relevant keyword message and computer equipment
CN102073671A (en) * 2009-11-19 2011-05-25 索尼公司 Topic identification system, topic identification device, topic identification method, client terminal, and information processing method
CN103020164A (en) * 2012-11-26 2013-04-03 华北电力大学 Semantic search method based on multi-semantic analysis and personalized sequencing
CN103207899A (en) * 2013-03-19 2013-07-17 新浪网技术(中国)有限公司 Method and system for recommending text files

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6741959B1 (en) * 1999-11-02 2004-05-25 Sap Aktiengesellschaft System and method to retrieving information with natural language queries
CN102073671A (en) * 2009-11-19 2011-05-25 索尼公司 Topic identification system, topic identification device, topic identification method, client terminal, and information processing method
CN102063469A (en) * 2010-12-03 2011-05-18 百度在线网络技术(北京)有限公司 Method and device for acquiring relevant keyword message and computer equipment
CN103020164A (en) * 2012-11-26 2013-04-03 华北电力大学 Semantic search method based on multi-semantic analysis and personalized sequencing
CN103207899A (en) * 2013-03-19 2013-07-17 新浪网技术(中国)有限公司 Method and system for recommending text files

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106326392A (en) * 2016-08-17 2017-01-11 合网络技术(北京)有限公司 Participating method and participating device for multimedia resource topic
CN107545039A (en) * 2017-07-31 2018-01-05 腾讯科技(深圳)有限公司 The index acquisition methods and device of keyword, computer equipment and storage medium

Also Published As

Publication number Publication date
CN104408036B (en) 2019-01-08

Similar Documents

Publication Publication Date Title
US9846748B2 (en) Searching for information based on generic attributes of the query
US9910886B2 (en) Visual representation of question quality
CN104102626B (en) A kind of method for short text Semantic Similarity Measurement
Fang et al. From captions to visual concepts and back
Bergsma et al. Stylometric analysis of scientific articles
Shen et al. Linden: linking named entities with knowledge base via semantic knowledge
CN105005589B (en) A kind of method and apparatus of text classification
Gu et al. " What Parts of Your Apps are Loved by Users?"(T)
US10482117B2 (en) Systems and methods for categorizing and moderating user-generated content in an online environment
US9348900B2 (en) Generating an answer from multiple pipelines using clustering
JP6381002B2 (en) Search recommendation method and apparatus
CA2777520C (en) System and method for phrase identification
WO2013125286A1 (en) Non-factoid question answering system and computer program
US10423648B2 (en) Method, system, and computer readable medium for interest tag recommendation
US20180107945A1 (en) Emoji recommendation method and device thereof
WO2017024884A1 (en) Search intention identification method and device
CN104199972B (en) A kind of name entity relation extraction and construction method based on deep learning
US20150088794A1 (en) Methods and systems of supervised learning of semantic relatedness
JP5744228B2 (en) Method and apparatus for blocking harmful information on the Internet
Liao et al. Evaluating the effectiveness of search task trails
US10296640B1 (en) Video segments for a video related to a task
JP3882048B2 (en) Question answering system and question answering processing method
US10496928B2 (en) Non-factoid question-answering system and method
JP2005222532A (en) Machine learning approach for determining document relevance for searching large-scale collection of electronic document
US20130060769A1 (en) System and method for identifying social media interactions

Legal Events

Date Code Title Description
PB01 Publication
C06 Publication
SE01 Entry into force of request for substantive examination
C10 Entry into substantive examination
GR01 Patent grant
GR01 Patent grant
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: Correlated topic recognition method and device

Effective date of registration: 20190531

Granted publication date: 20190108

Pledgee: Shenzhen Black Horse World Investment Consulting Co., Ltd.

Pledgor: Beijing Guoshuang Technology Co.,Ltd.

Registration number: 2019990000503

Denomination of invention: Correlated topic recognition method and device

Effective date of registration: 20190531

Granted publication date: 20190108

Pledgee: Shenzhen Black Horse World Investment Consulting Co., Ltd.

Pledgor: Beijing Guoshuang Technology Co.,Ltd.

Registration number: 2019990000503

PE01 Entry into force of the registration of the contract for pledge of patent right
CP02 Change in the address of a patent holder

Address after: 100083 No. 401, 4th Floor, Haitai Building, 229 North Fourth Ring Road, Haidian District, Beijing

Patentee after: Beijing Guoshuang Technology Co.,Ltd.

Address before: 100086 Beijing city Haidian District Shuangyushu Area No. 76 Zhichun Road cuigongfandian 8 layer A

Patentee before: Beijing Guoshuang Technology Co.,Ltd.

CP02 Change in the address of a patent holder