CN104408036B - It is associated with recognition methods and the device of topic - Google Patents

It is associated with recognition methods and the device of topic Download PDF

Info

Publication number
CN104408036B
CN104408036B CN201410779602.1A CN201410779602A CN104408036B CN 104408036 B CN104408036 B CN 104408036B CN 201410779602 A CN201410779602 A CN 201410779602A CN 104408036 B CN104408036 B CN 104408036B
Authority
CN
China
Prior art keywords
target
topic
multidimensional numerical
target keyword
correlation index
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410779602.1A
Other languages
Chinese (zh)
Other versions
CN104408036A (en
Inventor
刘粉香
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Gridsum Technology Co Ltd
Original Assignee
Beijing Gridsum Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Gridsum Technology Co Ltd filed Critical Beijing Gridsum Technology Co Ltd
Priority to CN201410779602.1A priority Critical patent/CN104408036B/en
Publication of CN104408036A publication Critical patent/CN104408036A/en
Application granted granted Critical
Publication of CN104408036B publication Critical patent/CN104408036B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of recognition methods for being associated with topic and devices.Wherein, the recognition methods for being associated with topic includes: acquisition target keyword;Determine the corresponding Multidimensional numerical of target keyword, wherein each dimension number is used to indicate an attribute of target keyword in Multidimensional numerical;Calculate the correlation index between the corresponding Multidimensional numerical of target keyword Multidimensional numerical corresponding with target topic, wherein, for correlation index for indicating the relevance between target keyword and each target topic, target topic is the multiple topics with Multidimensional numerical marked in advance;And topic associated with target keyword is determined according to the correlation index being calculated.Through the invention, it solves the problems, such as that the accuracy of topic detection in the prior art is low, has achieved the effect that the accuracy for improving topic detection.

Description

It is associated with recognition methods and the device of topic
Technical field
The present invention relates to topic detection fields, in particular to a kind of recognition methods for being associated with topic and device.
Background technique
Topic detection is primarily referred to as, and according to the keyword provided, is identified from a large amount of texts related to given keyword Topic, such as: how given keyword " college entrance examination " identifies topic associated in text.Here it is mutual that topic can be finger Topic in networking, such as news topic, microblog topic etc., mainly embody in a text form.
Currently, given keyword mainly to be matched to the topic in text, topic detection if in topic There is given keyword, then it is assumed that topic is related to keyword.However, the flexibility due to language is stronger, it may appear that in this way The case where: topic and given keyword relevance are higher, but do not occur the keyword in topic, using above-mentioned match party Formula then can not accurately identify topic relevant to keyword surely.
For the low problem of the accuracy of topic detection in the prior art, currently no effective solution has been proposed.
Summary of the invention
The main purpose of the present invention is to provide a kind of recognition methods for being associated with topic and devices, to solve in the prior art The low problem of the accuracy of topic detection.
To achieve the goals above, according to an aspect of an embodiment of the present invention, a kind of identification for being associated with topic is provided Method.The recognition methods of association topic according to the present invention includes: acquisition target keyword;Described in machine learning method determination The corresponding Multidimensional numerical of target keyword, wherein each dimension number is for indicating the target critical in the Multidimensional numerical One attribute of word;Calculate the pass between the corresponding Multidimensional numerical of target keyword Multidimensional numerical corresponding with target topic Join index, wherein the correlation index is for indicating being associated between the target keyword and target topic described in each Property, the target topic is the multiple topics with Multidimensional numerical marked in advance;And according to the correlation index being calculated Determine topic associated with the target keyword.
Further, it calculates between the corresponding Multidimensional numerical of target keyword Multidimensional numerical corresponding with target topic Correlation index include: calculate the corresponding Multidimensional numerical of target keyword Multidimensional numerical corresponding with the target topic it Between Euclidean distance, using the Euclidean distance as the correlation index, wherein the Europe between the target keyword and topic Relevance of the family name between the smaller expression target keyword and the topic is higher.
Further, the corresponding Multidimensional numerical of target keyword Multidimensional numerical corresponding with the target topic is calculated Between correlation index include: to obtain the corresponding Multidimensional numerical of the target topic;It is corresponding directly to calculate the target keyword Multidimensional numerical described in correlation index between the corresponding Multidimensional numerical of target topic, alternatively, obtaining every in the target topic The corresponding Multidimensional numerical of a word;Calculate each word in the corresponding Multidimensional numerical of the target keyword and the target topic Correlation index between corresponding Multidimensional numerical;It is corresponding with each word by the corresponding Multidimensional numerical of the target keyword Multidimensional numerical between correlation index the corresponding Multidimensional numerical of the target keyword and the target topic pair is calculated The correlation index between Multidimensional numerical answered.
Further, determined according to the correlation index being calculated include: with the associated topic of the target keyword Whether the correlation index being calculated described in judgement meets preset condition;If it is judged that the correlation index being calculated is full The foot preset condition, it is determined that the correlation index being calculated meets the target topic and the mesh of the preset condition It is associated to mark keyword;If it is judged that the correlation index being calculated is unsatisfactory for the preset condition, it is determined that described The target topic that the correlation index being calculated is unsatisfactory for the preset condition is uncorrelated to the target keyword.
Further, before obtaining target keyword, the recognition methods further include: obtain target text, the mesh Marking in text includes the target topic;The target text is segmented using participle tool, and marks the mesh Mark the part of speech of each word in text;It is determined according to the part-of-speech rule model pre-established according to the part of speech of the word after participle The target topic, and the target topic is marked;And each word after participle is determined according to machine learning method Corresponding Multidimensional numerical and the corresponding Multidimensional numerical of the target topic.
To achieve the goals above, according to another aspect of an embodiment of the present invention, a kind of identification for being associated with topic is provided Device.The identification device of association topic according to the present invention includes: first acquisition unit, for obtaining target keyword;First Determination unit, for determining the corresponding Multidimensional numerical of the target keyword according to machine learning method, wherein the multidimensional number Each dimension number is used to indicate an attribute of the target keyword in group;Computing unit is closed for calculating the target Correlation index between the corresponding Multidimensional numerical of keyword Multidimensional numerical corresponding with target topic, wherein the correlation index is used In indicating the relevance between the target keyword and each described target topic, the target topic marks in advance Multiple topics with Multidimensional numerical;And second determination unit, for being determined according to the correlation index being calculated and institute State the associated topic of target keyword.
Further, the computing unit includes: the first computing module, corresponding more for calculating the target keyword Euclidean distance between dimension group Multidimensional numerical corresponding with the target topic refers to the Euclidean distance as the association Number, wherein Euclidean distance between the target keyword and topic is smaller indicate the target keyword and the topic it Between relevance it is higher.
Further, the computing unit includes: the first acquisition module, for obtaining the corresponding multidimensional of the target topic Array;Second computing module, it is corresponding for directly calculating target topic described in the corresponding Multidimensional numerical of the target keyword Correlation index between Multidimensional numerical, alternatively, the computing unit includes: the second acquisition module, for obtaining the target words The corresponding Multidimensional numerical of each word in topic;Third computing module, for calculating the corresponding Multidimensional numerical of the target keyword Correlation index between Multidimensional numerical corresponding with word each in the target topic;4th computing module, for by described Institute is calculated in correlation index between the corresponding Multidimensional numerical of target keyword Multidimensional numerical corresponding with each word State the correlation index between the corresponding Multidimensional numerical of target keyword Multidimensional numerical corresponding with the target topic.
Further, second determination unit includes: judgment module, for judging the correlation index being calculated Whether preset condition is met;Determining module, for if it is judged that the correlation index being calculated meets the default item Part, it is determined that the target topic that the correlation index being calculated meets the preset condition is related to the target keyword Connection;If it is judged that the correlation index being calculated is unsatisfactory for the preset condition, it is determined that the pass being calculated The target topic that connection index is unsatisfactory for the preset condition is uncorrelated to the target keyword.
Further, the identification device further include: second acquisition unit, for obtaining before obtaining target keyword Target text is taken, includes the target topic in the target text;Participle unit, for utilizing participle tool to the mesh Mark text is segmented, and marks the part of speech of each word in the target text;Third determination unit, for according to preparatory The part-of-speech rule model of foundation determines the target topic according to the part of speech of the word after participle, and to the target topic into Line flag;And the 4th determination unit, for determining the corresponding Multidimensional numerical of each word after participle and the target topic pair The Multidimensional numerical answered.
In the embodiment of the present invention, by obtaining target keyword, determines the corresponding Multidimensional numerical of target keyword, calculate mesh The correlation index between the corresponding Multidimensional numerical of keyword Multidimensional numerical corresponding with target topic is marked, according to the pass being calculated Connection index determines topic associated with target keyword, by correlation judgement conversion between target keyword and target topic For the association between the Multidimensional numerical for indicating the Multidimensional numerical of target keyword attribute and for indicating target topic attribute The calculating of index, avoid causes the mode of Keywords matching that can not accurately identify topic due to not occurring keyword in topic The problem of, it solves the problems, such as that the accuracy of topic detection in the prior art is low, has reached the accuracy for improving topic detection Effect.
Detailed description of the invention
The attached drawing constituted part of this application is used to provide further understanding of the present invention, schematic reality of the invention It applies example and its explanation is used to explain the present invention, do not constitute improper limitations of the present invention.In the accompanying drawings:
Fig. 1 is the flow chart of the recognition methods of association topic according to an embodiment of the present invention;And
Fig. 2 is the schematic diagram of the identification device of association topic according to an embodiment of the present invention.
Specific embodiment
It should be noted that in the absence of conflict, the features in the embodiments and the embodiments of the present application can phase Mutually combination.The present invention will be described in detail below with reference to the accompanying drawings and embodiments.
In order to enable those skilled in the art to better understand the solution of the present invention, below in conjunction in the embodiment of the present invention Attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is only The embodiment of a part of the invention, instead of all the embodiments.Based on the embodiments of the present invention, ordinary skill people The model that the present invention protects all should belong in member's every other embodiment obtained without making creative work It encloses.
It should be noted that description and claims of this specification and term " first " in above-mentioned attached drawing, " Two " etc. be to be used to distinguish similar objects, without being used to describe a particular order or precedence order.It should be understood that using in this way Data be interchangeable under appropriate circumstances, so as to the embodiment of the present invention described herein.In addition, term " includes " and " tool Have " and their any deformation, it is intended that cover it is non-exclusive include, for example, containing a series of steps or units Process, method, system, product or equipment those of are not necessarily limited to be clearly listed step or unit, but may include without clear Other step or units listing to Chu or intrinsic for these process, methods, product or equipment.
The embodiment of the invention provides a kind of recognition methods for being associated with topic.
Fig. 1 is the flow chart of the recognition methods of association topic according to an embodiment of the present invention.As shown in Figure 1, the association is talked about The recognition methods of topic comprises the following steps that
Step S102 obtains target keyword.
Target keyword can be one or more, such as:, college entrance examination etc. in 2014.
Step S104 determines the corresponding Multidimensional numerical of target keyword with machine learning method.Wherein, every in Multidimensional numerical Dimension array is used to indicate an attribute of target keyword.
Since dimension number each in Multidimensional numerical is used to indicate an attribute of target keyword, then target keyword pair A unique Multidimensional numerical is answered, that is to say with Multidimensional numerical indicates target keyword.For example, for 500 dimension array representations It is corresponding unique to obtain target keyword by machine learning method after getting target keyword for target keyword 500 dimension groups.
Step S106 calculates the pass between the corresponding Multidimensional numerical of target keyword Multidimensional numerical corresponding with target topic Join index.Wherein, correlation index is used to indicate the relevance between target keyword and each target topic, and target topic is The multiple topics with Multidimensional numerical marked in advance.
Step S108 determines topic associated with target keyword according to the correlation index being calculated.
Each target topic also corresponds to a unique Multidimensional numerical, that is to say the unique Multidimensional numerical of each topic To indicate, wherein each dimension word indicates an attribute in target topic in the Multidimensional numerical.It should be noted that being used for Indicate that number of dimensions and the expression number of dimensions of target keyword of the Multidimensional numerical of topic are identical (hereafter similarly), to avoid meter It calculates wrong.
After determining the corresponding Multidimensional numerical of target keyword, it is corresponding with target topic more to calculate the Multidimensional numerical Correlation index between dimension group.Due to that there can be multiple topics in text, then the corresponding multidimensional of target keyword is calculated separately Correlation index between array and the corresponding multidimensional data of multiple topics, thus obtain target keyword and multiple topics it Between relevance.Finally, determined according to the correlation index being calculated for the associated topic of target keyword, specifically, Corresponding threshold value can be set, when correlation index is more than the threshold value, then it is assumed that the target topic is associated with target keyword, Otherwise, then it is assumed that uncorrelated.
In the embodiment of the present invention, by obtaining target keyword, determines the corresponding Multidimensional numerical of target keyword, calculate mesh The correlation index between the corresponding Multidimensional numerical of keyword Multidimensional numerical corresponding with target topic is marked, according to the pass being calculated Connection index determines topic associated with target keyword, by correlation judgement conversion between target keyword and target topic For the association between the Multidimensional numerical for indicating the Multidimensional numerical of target keyword attribute and for indicating target topic attribute The calculating of index, avoid causes the mode of Keywords matching that can not accurately identify topic due to not occurring keyword in topic The problem of, it solves the problems, such as that the accuracy of topic detection in the prior art is low, has reached the accuracy for improving topic detection Effect.
It, can be according to correlation pair after determining for the associated topic of target keyword in the embodiment of the present invention Topic is ranked up, for example, showing being associated between target keyword and topic if the correlation index being calculated is bigger Property it is higher, then target topic can be ranked up according to the correlation index being calculated is descending, thus obtain topic pass Note degree sequencing table.If the correlation index being calculated is smaller, show that the relevance between target keyword and topic is higher, Target topic can be then ranked up according to correlation index is ascending.
Optionally, the association between the corresponding Multidimensional numerical of target keyword Multidimensional numerical corresponding with target topic is calculated Index includes: the Euclidean distance calculated between the corresponding Multidimensional numerical of target keyword Multidimensional numerical corresponding with target topic, Using Euclidean distance as correlation index.
In the embodiment of the present invention, being associated between target keyword and topic is indicated with the Euclidean distance between array Property, wherein the smaller expression target keyword of Euclidean distance between target keyword and topic and the relevance between topic are got over It is high;Euclidean distance is bigger to indicate that the higher relevance between target keyword and topic the lower.In this way, according to target topic and closing It is in the present embodiment, then ascending to target according to Euclidean distance when relevance height between keyword is ranked up topic Topic is ranked up, and obtains attention rate sequencing table.
In the embodiment of the present invention, judged between target keyword and target topic using the Euclidean distance calculated between array Relevance, improve the speed of topic detection.
Optionally, the association between the corresponding Multidimensional numerical of target keyword Multidimensional numerical corresponding with target topic is calculated Index includes: to obtain the corresponding Multidimensional numerical of each word in target topic;Calculate the corresponding Multidimensional numerical of target keyword with Correlation index in target topic between the corresponding Multidimensional numerical of each word;Due to the corresponding Multidimensional numerical of target keyword with The corresponding Multidimensional numerical of target keyword is calculated in correlation index between the corresponding Multidimensional numerical of each word and target is talked about Inscribe the correlation index between corresponding Multidimensional numerical.
It is made of word according to certain grammer due to topic, includes multiple words in topic, use machine learning When method calculates the corresponding Multidimensional numerical of target keyword and target topic corresponding Multidimensional numerical, first to calculate in target topic The Multidimensional numerical of each word, the correlation index between Multidimensional numerical corresponding with target topic, which can be, calculates separately target words Correlation index in topic between each corresponding Multidimensional numerical of word and array corresponding with target keyword, then referred to by the association Number obtains the relevance of target keyword Yu target topic.For example, calculating separately the corresponding multidimensional of each word in target topic It is corresponding that target keyword is calculated by the Euclidean distance in Euclidean distance between array and array corresponding with target keyword Multidimensional numerical Multidimensional numerical corresponding with target topic between correlation index.In this way, by word each in topic with Relevance between target keyword determines the relevance between target topic and target keyword, further increases topic pair The accuracy for answering array to calculate, and then ensure the accuracy of topic detection.
Optionally, the association between the corresponding Multidimensional numerical of target keyword Multidimensional numerical corresponding with target topic is calculated Index includes: to obtain the corresponding Multidimensional numerical of target topic;Directly calculate the corresponding Multidimensional numerical target topic of target keyword Correlation index between corresponding Multidimensional numerical.
Since topic is made of multiple words, machine first can be passed through according to the corresponding Multidimensional numerical of word each in topic Study obtains the corresponding Multidimensional numerical of topic.So, when calculating correlation index, available target topic first passes through engineering in advance Then it is corresponding with target topic more directly to calculate the corresponding Multidimensional numerical of target keyword for unique Multidimensional numerical that acquistion is arrived Correlation index between dimension group.For the correlation index by calculating target keyword and each word in topic, Substantially increase the speed of the correlation index calculating of target keyword and topic.
Preferably, determine to include: judgement meter with the associated topic of target keyword according to the correlation index being calculated Whether obtained correlation index meets preset condition;If it is judged that the correlation index being calculated meets preset condition, then The target topic that the determining correlation index being calculated meets preset condition is associated with target keyword;If it is judged that calculating Obtained correlation index is unsatisfactory for preset condition, it is determined that the correlation index being calculated is unsatisfactory for the target topic of preset condition It is uncorrelated to target keyword.
In the present embodiment, preset condition can be preset threshold, for example, then showing target topic when correlation index is bigger Between target keyword to be associated with performance higher, then, judging whether the correlation index being calculated meets preset condition can To be whether the correlation index that judgement is calculated is more than preset threshold, if it exceeds, it is determined that topic and target keyword phase Association, conversely, then uncorrelated.
If it is default to judge whether the correlation index being calculated meets for Euclidean distance of the correlation index between array Condition, which can be, judges whether Euclidean distance is less than preset threshold, if it is, determining that topic is associated with target keyword, instead It, then it is uncorrelated.
By be arranged preset condition, quickly determined out from the result being calculated it is related to target keyword if Topic, to improve the accuracy of topic detection.
Preferably, before obtaining target keyword, recognition methods further include: obtain target text, wrapped in target text Contain target topic;Target text is segmented using participle tool, and marks the part of speech of each word in target text; Target topic is determined according to the part of speech of the word after participle according to the part-of-speech rule model pre-established, and to target topic into Line flag;The corresponding Multidimensional numerical of each word and the corresponding Multidimensional numerical of target topic after determining participle.
Acquisition includes the target text of topic, establishes text training set, and text word segmentation regulation is set as needed;With The part-of-speech rule model (such as noun+verb or noun+verb+object) of semantic analysis construction topic;Utilize participle work Have (including the text word segmentation regulation of setting) and carry out text analyzing, and marks all parts of speech of each word, while marking topic; It is respectively indicated all words (including topic) with Multidimensional numerical, such as 500 dimensions, each word is obtained by machine learning method Corresponding unique Multidimensional numerical.In this way, after getting target keyword and determining the Multidimensional numerical of target keyword, Ke Yizhi It connects Multidimensional numerical corresponding with topic and correlation index such as Euclidean distance is calculated.
In the embodiment of the present invention, topic is defined by part-of-speech rule model, obtains each word and words with machine learning method Corresponding array is inscribed, so that topic relevance judgement is converted into the calculating of correlation index between array, greatly improves related words Inscribe the speed and accuracy of identification.
The embodiment of the invention also provides a kind of identification devices for being associated with topic.The device can pass through computer equipment reality Its existing function.It should be noted that the identification device of the association topic of the embodiment of the present invention can be used for executing implementation of the present invention The recognition methods of association topic provided by example, the recognition methods of the association topic of the embodiment of the present invention can also be through the invention The identification device of topic is associated with provided by embodiment to execute.
Fig. 2 is the schematic diagram of the identification device of association topic according to an embodiment of the present invention.As shown in Fig. 2, the association is talked about The identification device of topic includes: first acquisition unit 10, the first determination unit 20, computing unit 30 and the second determination unit 40.
First acquisition unit 10 is for obtaining target keyword.
Target keyword can be one or more, such as:, college entrance examination etc. in 2014.
First determination unit 20 is used to determine the corresponding Multidimensional numerical of target keyword with machine learning method, wherein more Each dimension number is used to indicate an attribute of target keyword in dimension group.
Since dimension number each in Multidimensional numerical is used to indicate an attribute of target keyword, then target keyword pair A unique Multidimensional numerical is answered, that is to say with Multidimensional numerical indicates target keyword.For example, for 500 dimension array representations It is corresponding only can to obtain target keyword by machine learning method after getting target keyword for target keyword One 500 dimension groups.
Computing unit 30 for calculate the corresponding Multidimensional numerical of target keyword Multidimensional numerical corresponding with target topic it Between correlation index, wherein correlation index is used to indicate relevance between target keyword and target topic, and target topic is The multiple topics with Multidimensional numerical marked in advance.
Second determination unit 40 be used to be determined according to the correlation index that is calculated it is associated with target keyword if Topic.
Each target topic also corresponds to a unique Multidimensional numerical, that is to say the unique Multidimensional numerical of each topic To indicate, wherein each dimension word indicates an attribute in target topic in the Multidimensional numerical.It should be noted that being used for Indicate that number of dimensions and the expression number of dimensions of target keyword of the Multidimensional numerical of topic are identical (hereafter similarly), to avoid meter It calculates wrong.
After determining the corresponding Multidimensional numerical of target keyword, it is corresponding with target topic more to calculate the Multidimensional numerical Correlation index between dimension group.Due to that there can be multiple topics in text, then the corresponding multidimensional of target keyword is calculated separately Correlation index between array and the corresponding multidimensional data of multiple topics, thus obtain target keyword and multiple topics it Between relevance.Finally, determined according to the correlation index being calculated for the associated topic of target keyword, specifically, Corresponding threshold value can be set, when correlation index is more than the threshold value, then it is assumed that the target topic is associated with target keyword, Otherwise, then it is assumed that uncorrelated.
In the embodiment of the present invention, by obtaining target keyword, determines the corresponding Multidimensional numerical of target keyword, calculate mesh The correlation index between the corresponding Multidimensional numerical of keyword Multidimensional numerical corresponding with target topic is marked, according to the pass being calculated Connection index determines topic associated with target keyword, by correlation judgement conversion between target keyword and target topic For the association between the Multidimensional numerical for indicating the Multidimensional numerical of target keyword attribute and for indicating target topic attribute The calculating of index, avoid causes the mode of Keywords matching that can not accurately identify topic due to not occurring keyword in topic The problem of, it solves the problems, such as that the accuracy of topic detection in the prior art is low, has reached the accuracy for improving topic detection Effect.
It, can be according to correlation pair after determining for the associated topic of target keyword in the embodiment of the present invention Topic is ranked up, for example, showing being associated between target keyword and topic if the correlation index being calculated is bigger Property it is higher, then target topic can be ranked up according to the correlation index being calculated is descending, thus obtain topic pass Note degree sequencing table.If the correlation index being calculated is smaller, show that the relevance between target keyword and topic is higher, Target topic can be then ranked up according to correlation index is ascending.
Preferably, computing unit includes: the first computing module, for calculating the corresponding Multidimensional numerical of target keyword and mesh The Euclidean distance between the corresponding Multidimensional numerical of topic is marked, using Euclidean distance as correlation index.
In the embodiment of the present invention, being associated between target keyword and topic is indicated with the Euclidean distance between array Property, wherein the smaller expression target keyword of Euclidean distance between target keyword and topic and the relevance between topic are got over It is high;Euclidean distance is bigger to indicate that the higher relevance between target keyword and topic the lower.In this way, according to target topic and closing It is in the present embodiment, then ascending to target according to Euclidean distance when relevance height between keyword is ranked up topic Topic is ranked up, and obtains attention rate sequencing table.
In the embodiment of the present invention, judged between target keyword and target topic using the Euclidean distance calculated between array Relevance, improve the speed of topic detection.
Preferably, computing unit includes: the second acquisition module, for obtaining the corresponding multidimensional of each word in target topic Array;Third computing module, it is corresponding with each word in target topic for calculating the corresponding Multidimensional numerical of target keyword Correlation index between Multidimensional numerical;4th computing module, for by the corresponding Multidimensional numerical of target keyword and each word It is corresponding with target topic that the corresponding Multidimensional numerical of target keyword is calculated in correlation index between corresponding Multidimensional numerical Correlation index between Multidimensional numerical.
It is made of word according to certain grammer due to topic, includes multiple words in topic, use machine learning When method calculates the corresponding Multidimensional numerical of target keyword and target topic corresponding Multidimensional numerical, first to calculate in target topic The Multidimensional numerical of each word, the correlation index between Multidimensional numerical corresponding with target topic, which can be, calculates separately target words Correlation index in topic between each corresponding Multidimensional numerical of word and array corresponding with target keyword, then referred to by the association Number obtains the relevance of target keyword Yu target topic.For example, calculating separately the corresponding multidimensional of each word in target topic It is corresponding that target keyword is calculated by the Euclidean distance in Euclidean distance between array and array corresponding with target keyword Multidimensional numerical Multidimensional numerical corresponding with target topic between correlation index.In this way, by word each in topic with Relevance between target keyword determines the relevance between target topic and target keyword, further increases topic pair The accuracy for answering array to calculate, and then ensure the accuracy of topic detection.
Optionally, computing unit includes: the first acquisition module, for obtaining the corresponding Multidimensional numerical of target topic;Second Computing module, for directly calculating the association between the corresponding Multidimensional numerical of the corresponding Multidimensional numerical target topic of target keyword Index.
Since topic is made of multiple words, machine first can be passed through according to the corresponding Multidimensional numerical of word each in topic Study obtains the corresponding Multidimensional numerical of topic.So, when calculating correlation index, available target topic first passes through engineering in advance Then it is corresponding with target topic more directly to calculate the corresponding Multidimensional numerical of target keyword for unique Multidimensional numerical that acquistion is arrived Correlation index between dimension group.For the correlation index by calculating target keyword and each word in topic, Substantially increase the speed of the correlation index calculating of target keyword and topic.
Preferably, the second determination unit includes: judgment module, for judging it is pre- whether the correlation index being calculated meets If condition;Determining module, for if it is judged that the correlation index being calculated meets preset condition, it is determined that be calculated The target topic that correlation index meets preset condition is associated with target keyword;If it is judged that the correlation index being calculated It is unsatisfactory for preset condition, it is determined that the target topic and target keyword that the correlation index being calculated is unsatisfactory for preset condition are not It is related.
In the present embodiment, preset condition can be preset threshold, for example, then showing target topic when correlation index is bigger Between target keyword to be associated with performance higher, then, judging whether the correlation index being calculated meets preset condition can To be whether the correlation index that judgement is calculated is more than preset threshold, if it exceeds, it is determined that topic and target keyword phase Association, conversely, then uncorrelated.
If it is default to judge whether the correlation index being calculated meets for Euclidean distance of the correlation index between array Condition, which can be, judges whether Euclidean distance is less than preset threshold, if it is, determining that topic is associated with target keyword, instead It, then it is uncorrelated.
By be arranged preset condition, quickly determined out from the result being calculated it is related to target keyword if Topic, to improve the accuracy of topic detection.
Preferably, identification device further include: second acquisition unit, for obtaining target before obtaining target keyword Text includes target topic in target text;Participle unit, for being segmented using participle tool to target text, and Mark the part of speech of each word in target text;Third determination unit, for according to the part-of-speech rule model root pre-established Target topic is determined according to the part of speech of the word after participle, and target topic is marked;And the 4th determination unit, it is used for The corresponding Multidimensional numerical of each word and the corresponding Multidimensional numerical of target topic after determining participle.
Acquisition includes the target text of topic, establishes text training set, and text word segmentation regulation is set as needed;With The part-of-speech rule model (such as noun+verb or noun+verb+object) of semantic analysis construction topic;Utilize participle work Have (including the text word segmentation regulation of setting) and carry out text analyzing, and marks all parts of speech of each word, while marking topic; It is respectively indicated all words (including topic) with Multidimensional numerical, such as 500 dimensions, each word is obtained by machine learning method Corresponding unique Multidimensional numerical.In this way, after getting target keyword and determining the Multidimensional numerical of target keyword, Ke Yizhi It connects Multidimensional numerical corresponding with topic and correlation index such as Euclidean distance is calculated.
In the embodiment of the present invention, topic is defined by part-of-speech rule model, obtains each word and words with machine learning method Corresponding array is inscribed, so that topic relevance judgement is converted into the calculating of correlation index between array, greatly improves related words Inscribe the speed and accuracy of identification.
It should be noted that for the various method embodiments described above, for simple description, therefore, it is stated as a series of Combination of actions, but those skilled in the art should understand that, the present invention is not limited by the sequence of acts described because According to the present invention, some steps may be performed in other sequences or simultaneously.Secondly, those skilled in the art should also know It knows, the embodiments described in the specification are all preferred embodiments, and related actions and modules is not necessarily of the invention It is necessary.
In the above-described embodiments, it all emphasizes particularly on different fields to the description of each embodiment, there is no the portion being described in detail in some embodiment Point, reference can be made to the related descriptions of other embodiments.
In several embodiments provided herein, it should be understood that disclosed device, it can be by another way It realizes.For example, the apparatus embodiments described above are merely exemplary, such as the division of the unit, it is only a kind of Logical function partition, there may be another division manner in actual implementation, such as multiple units or components can combine or can To be integrated into another system, or some features can be ignored or not executed.Another point, shown or discussed is mutual Coupling, direct-coupling or communication connection can be through some interfaces, the indirect coupling or communication connection of device or unit, It can be electrical or other forms.
The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme 's.
It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list Member both can take the form of hardware realization, can also realize in the form of software functional units.
If the integrated unit is realized in the form of SFU software functional unit and sells or use as independent product When, it can store in a computer readable storage medium.Based on this understanding, technical solution of the present invention is substantially The all or part of the part that contributes to existing technology or the technical solution can be in the form of software products in other words It embodies, which is stored in a storage medium, including some instructions are used so that a computer Equipment (can be personal computer, mobile terminal, server or network equipment etc.) executes side described in each embodiment of the present invention The all or part of the steps of method.And storage medium above-mentioned include: USB flash disk, read-only memory (ROM, Read-Only Memory), Random access memory (RAM, Random Access Memory), mobile hard disk, magnetic or disk etc. are various to be can store The medium of program code.
The foregoing is only a preferred embodiment of the present invention, is not intended to restrict the invention, for the skill of this field For art personnel, the invention may be variously modified and varied.All within the spirits and principles of the present invention, made any to repair Change, equivalent replacement, improvement etc., should all be included in the protection scope of the present invention.

Claims (8)

1. a kind of recognition methods for being associated with topic characterized by comprising
Obtain target keyword;
The corresponding Multidimensional numerical of the target keyword is determined with machine learning method, wherein per one-dimensional in the Multidimensional numerical Degree word is used to indicate an attribute of the target keyword;
The correlation index between the corresponding Multidimensional numerical of target keyword Multidimensional numerical corresponding with target topic is calculated, In, the correlation index is used to indicate the relevance between the target keyword and each described target topic, the mesh Marking topic is the multiple topics with Multidimensional numerical marked in advance, for indicating the number of dimensions and table of the Multidimensional numerical of topic Show that the number of dimensions of target keyword is identical;And
Topic associated with the target keyword is determined according to the correlation index being calculated;
Wherein, determined according to the correlation index being calculated include: with the associated topic of the target keyword
Whether the correlation index being calculated described in judgement meets preset condition;
If it is judged that the correlation index being calculated meets the preset condition, it is determined that the association being calculated The target topic that index meets the preset condition is associated with the target keyword;
If it is judged that the correlation index being calculated is unsatisfactory for the preset condition, it is determined that the pass being calculated The target topic that connection index is unsatisfactory for the preset condition is uncorrelated to the target keyword;
Wherein, the corresponding unique Multidimensional numerical of each target topic.
2. recognition methods according to claim 1, which is characterized in that calculate the corresponding Multidimensional numerical of the target keyword Correlation index between Multidimensional numerical corresponding with target topic includes:
Calculate Euclidean between the corresponding Multidimensional numerical of target keyword Multidimensional numerical corresponding with the target topic away from From using the Euclidean distance as the correlation index, wherein the Euclidean distance between the target keyword and topic is smaller Indicate that the relevance between the target keyword and the topic is higher.
3. recognition methods according to claim 1, which is characterized in that calculate the corresponding Multidimensional numerical of the target keyword Correlation index between Multidimensional numerical corresponding with the target topic includes:
Obtain the corresponding Multidimensional numerical of the target topic;Directly calculate mesh described in the corresponding Multidimensional numerical of the target keyword The correlation index between the corresponding Multidimensional numerical of topic is marked,
Alternatively,
Obtain the corresponding Multidimensional numerical of each word in the target topic;Calculate the corresponding Multidimensional numerical of the target keyword Correlation index between Multidimensional numerical corresponding with word each in the target topic;It is corresponding more by the target keyword It is corresponding that the target keyword is calculated in correlation index between dimension group Multidimensional numerical corresponding with each word Correlation index between Multidimensional numerical Multidimensional numerical corresponding with the target topic.
4. recognition methods according to claim 1, which is characterized in that before obtaining target keyword, the identification side Method further include:
Target text is obtained, includes the target topic in the target text;
The target text is segmented using participle tool, and marks the part of speech of each word in the target text;
The target topic is determined according to the part of speech of the word after participle according to the part-of-speech rule model pre-established, and to institute Target topic is stated to be marked;And
The corresponding Multidimensional numerical of each word and the corresponding Multidimensional numerical of the target topic after determining participle.
5. a kind of identification device for being associated with topic characterized by comprising
First acquisition unit, for obtaining target keyword;
First determination unit, for determining the corresponding Multidimensional numerical of the target keyword with machine learning method, wherein described Each dimension number is used to indicate an attribute of the target keyword in Multidimensional numerical;
Computing unit, for calculating between the corresponding Multidimensional numerical of target keyword Multidimensional numerical corresponding with target topic Correlation index, wherein the correlation index is for indicating between the target keyword and each described target topic Relevance, the target topic is the multiple topics with Multidimensional numerical marked in advance, for indicating the Multidimensional numerical of topic Number of dimensions with indicate the number of dimensions of target keyword it is identical;And
Second determination unit, for according to the correlation index that is calculated determine it is associated with the target keyword if Topic;
Wherein, second determination unit includes:
Judgment module, for judging whether the correlation index being calculated meets preset condition;
Determining module, for if it is judged that the correlation index being calculated meets the preset condition, it is determined that described The target topic that the correlation index being calculated meets the preset condition is associated with the target keyword;If it is judged that The correlation index being calculated is unsatisfactory for the preset condition, it is determined that the correlation index being calculated is unsatisfactory for institute The target topic for stating preset condition is uncorrelated to the target keyword;
Wherein, the corresponding unique Multidimensional numerical of each target topic.
6. identification device according to claim 5, which is characterized in that the computing unit includes:
First computing module, for calculating the corresponding Multidimensional numerical of target keyword multidimensional corresponding with the target topic Euclidean distance between array, using the Euclidean distance as the correlation index, wherein the target keyword and topic it Between Euclidean distance smaller indicate that the relevance between the target keyword and the topic is higher.
7. identification device according to claim 5, which is characterized in that the computing unit includes:
First obtains module, for obtaining the corresponding Multidimensional numerical of the target topic;Second computing module, for directly calculating Correlation index between the corresponding Multidimensional numerical of target topic described in the corresponding Multidimensional numerical of the target keyword,
Alternatively, the computing unit includes:
Second obtains module, for obtaining the corresponding Multidimensional numerical of each word in the target topic;Third computing module is used Between the corresponding Multidimensional numerical of calculating target keyword Multidimensional numerical corresponding with word each in the target topic Correlation index;4th computing module, for corresponding with each word by the corresponding Multidimensional numerical of the target keyword Multidimensional numerical between correlation index the corresponding Multidimensional numerical of the target keyword and the target topic pair is calculated The correlation index between Multidimensional numerical answered.
8. identification device according to claim 5, which is characterized in that the identification device further include:
Second acquisition unit includes in the target text for before obtaining target keyword, obtaining target text State target topic;
Participle unit for being segmented using participle tool to the target text, and is marked every in the target text The part of speech of a word;
Third determination unit, for determining institute according to the part of speech of the word after participle according to the part-of-speech rule model pre-established Target topic is stated, and the target topic is marked;And
4th determination unit, for determining the corresponding Multidimensional numerical of each word and the corresponding multidimensional of the target topic after segmenting Array.
CN201410779602.1A 2014-12-15 2014-12-15 It is associated with recognition methods and the device of topic Active CN104408036B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410779602.1A CN104408036B (en) 2014-12-15 2014-12-15 It is associated with recognition methods and the device of topic

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410779602.1A CN104408036B (en) 2014-12-15 2014-12-15 It is associated with recognition methods and the device of topic

Publications (2)

Publication Number Publication Date
CN104408036A CN104408036A (en) 2015-03-11
CN104408036B true CN104408036B (en) 2019-01-08

Family

ID=52645668

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410779602.1A Active CN104408036B (en) 2014-12-15 2014-12-15 It is associated with recognition methods and the device of topic

Country Status (1)

Country Link
CN (1) CN104408036B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106326392A (en) * 2016-08-17 2017-01-11 合网络技术(北京)有限公司 Participating method and participating device for multimedia resource topic
CN107545039B (en) * 2017-07-31 2021-05-18 腾讯科技(深圳)有限公司 Keyword index acquisition method and device, computer equipment and storage medium
CN109345282A (en) * 2018-08-22 2019-02-15 中国平安人寿保险股份有限公司 A kind of response method and equipment of business consultation
CN110457599B (en) * 2019-08-15 2021-09-03 中国电子信息产业集团有限公司第六研究所 Hot topic tracking method and device, server and readable storage medium

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103207899A (en) * 2013-03-19 2013-07-17 新浪网技术(中国)有限公司 Method and system for recommending text files

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE19952769B4 (en) * 1999-11-02 2008-07-17 Sap Ag Search engine and method for retrieving information using natural language queries
JP2011108117A (en) * 2009-11-19 2011-06-02 Sony Corp Topic identification system, topic identification device, client terminal, program, topic identification method, and information processing method
CN102063469B (en) * 2010-12-03 2013-04-24 百度在线网络技术(北京)有限公司 Method and device for acquiring relevant keyword message and computer equipment
CN103020164B (en) * 2012-11-26 2015-06-10 华北电力大学 Semantic search method based on multi-semantic analysis and personalized sequencing

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103207899A (en) * 2013-03-19 2013-07-17 新浪网技术(中国)有限公司 Method and system for recommending text files

Also Published As

Publication number Publication date
CN104408036A (en) 2015-03-11

Similar Documents

Publication Publication Date Title
CN107204184B (en) Audio recognition method and system
KR102163549B1 (en) Method and apparatus for determining retreat
CN106815192B (en) Model training method and device and sentence emotion recognition method and device
CN104408191B (en) The acquisition methods and device of the association keyword of keyword
CN108829893A (en) Determine method, apparatus, storage medium and the terminal device of video tab
CN109165386A (en) A kind of Chinese empty anaphora resolution method and system
CN104516986A (en) Method and device for recognizing sentence
CN103336766A (en) Short text garbage identification and modeling method and device
CN104408036B (en) It is associated with recognition methods and the device of topic
CN110147425A (en) A kind of keyword extracting method, device, computer equipment and storage medium
CN105843796A (en) Microblog emotional tendency analysis method and device
CN108009297B (en) Text emotion analysis method and system based on natural language processing
CN104537341A (en) Human face picture information obtaining method and device
CN103235773B (en) The tag extraction method and device of text based on keyword
CN106844482B (en) Search engine-based retrieval information matching method and device
CN107102993A (en) A kind of user's demand analysis method and device
US9652997B2 (en) Method and apparatus for building emotion basis lexeme information on an emotion lexicon comprising calculation of an emotion strength for each lexeme
CN110096572A (en) A kind of sample generating method, device and computer-readable medium
CN112528294A (en) Vulnerability matching method and device, computer equipment and readable storage medium
CN106815265A (en) The searching method and device of judgement document
CN110287405A (en) The method, apparatus and storage medium of sentiment analysis
CN103389981B (en) Network label automatic identification method and its system
CN109376362A (en) A kind of the determination method and relevant device of corrected text
CN109033078B (en) The recognition methods of sentence classification and device, storage medium, processor
CN111291561B (en) Text recognition method, device and system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: Correlated topic recognition method and device

Effective date of registration: 20190531

Granted publication date: 20190108

Pledgee: Shenzhen Black Horse World Investment Consulting Co., Ltd.

Pledgor: Beijing Guoshuang Technology Co.,Ltd.

Registration number: 2019990000503

PE01 Entry into force of the registration of the contract for pledge of patent right
CP02 Change in the address of a patent holder

Address after: 100083 No. 401, 4th Floor, Haitai Building, 229 North Fourth Ring Road, Haidian District, Beijing

Patentee after: Beijing Guoshuang Technology Co.,Ltd.

Address before: 100086 Beijing city Haidian District Shuangyushu Area No. 76 Zhichun Road cuigongfandian 8 layer A

Patentee before: Beijing Guoshuang Technology Co.,Ltd.

CP02 Change in the address of a patent holder