Summary of the invention
The main purpose of the present invention is to provide a kind of recognition methods for being associated with topic and devices, to solve in the prior art
The low problem of the accuracy of topic detection.
To achieve the goals above, according to an aspect of an embodiment of the present invention, a kind of identification for being associated with topic is provided
Method.The recognition methods of association topic according to the present invention includes: acquisition target keyword;Described in machine learning method determination
The corresponding Multidimensional numerical of target keyword, wherein each dimension number is for indicating the target critical in the Multidimensional numerical
One attribute of word;Calculate the pass between the corresponding Multidimensional numerical of target keyword Multidimensional numerical corresponding with target topic
Join index, wherein the correlation index is for indicating being associated between the target keyword and target topic described in each
Property, the target topic is the multiple topics with Multidimensional numerical marked in advance;And according to the correlation index being calculated
Determine topic associated with the target keyword.
Further, it calculates between the corresponding Multidimensional numerical of target keyword Multidimensional numerical corresponding with target topic
Correlation index include: calculate the corresponding Multidimensional numerical of target keyword Multidimensional numerical corresponding with the target topic it
Between Euclidean distance, using the Euclidean distance as the correlation index, wherein the Europe between the target keyword and topic
Relevance of the family name between the smaller expression target keyword and the topic is higher.
Further, the corresponding Multidimensional numerical of target keyword Multidimensional numerical corresponding with the target topic is calculated
Between correlation index include: to obtain the corresponding Multidimensional numerical of the target topic;It is corresponding directly to calculate the target keyword
Multidimensional numerical described in correlation index between the corresponding Multidimensional numerical of target topic, alternatively, obtaining every in the target topic
The corresponding Multidimensional numerical of a word;Calculate each word in the corresponding Multidimensional numerical of the target keyword and the target topic
Correlation index between corresponding Multidimensional numerical;It is corresponding with each word by the corresponding Multidimensional numerical of the target keyword
Multidimensional numerical between correlation index the corresponding Multidimensional numerical of the target keyword and the target topic pair is calculated
The correlation index between Multidimensional numerical answered.
Further, determined according to the correlation index being calculated include: with the associated topic of the target keyword
Whether the correlation index being calculated described in judgement meets preset condition;If it is judged that the correlation index being calculated is full
The foot preset condition, it is determined that the correlation index being calculated meets the target topic and the mesh of the preset condition
It is associated to mark keyword;If it is judged that the correlation index being calculated is unsatisfactory for the preset condition, it is determined that described
The target topic that the correlation index being calculated is unsatisfactory for the preset condition is uncorrelated to the target keyword.
Further, before obtaining target keyword, the recognition methods further include: obtain target text, the mesh
Marking in text includes the target topic;The target text is segmented using participle tool, and marks the mesh
Mark the part of speech of each word in text;It is determined according to the part-of-speech rule model pre-established according to the part of speech of the word after participle
The target topic, and the target topic is marked;And each word after participle is determined according to machine learning method
Corresponding Multidimensional numerical and the corresponding Multidimensional numerical of the target topic.
To achieve the goals above, according to another aspect of an embodiment of the present invention, a kind of identification for being associated with topic is provided
Device.The identification device of association topic according to the present invention includes: first acquisition unit, for obtaining target keyword;First
Determination unit, for determining the corresponding Multidimensional numerical of the target keyword according to machine learning method, wherein the multidimensional number
Each dimension number is used to indicate an attribute of the target keyword in group;Computing unit is closed for calculating the target
Correlation index between the corresponding Multidimensional numerical of keyword Multidimensional numerical corresponding with target topic, wherein the correlation index is used
In indicating the relevance between the target keyword and each described target topic, the target topic marks in advance
Multiple topics with Multidimensional numerical;And second determination unit, for being determined according to the correlation index being calculated and institute
State the associated topic of target keyword.
Further, the computing unit includes: the first computing module, corresponding more for calculating the target keyword
Euclidean distance between dimension group Multidimensional numerical corresponding with the target topic refers to the Euclidean distance as the association
Number, wherein Euclidean distance between the target keyword and topic is smaller indicate the target keyword and the topic it
Between relevance it is higher.
Further, the computing unit includes: the first acquisition module, for obtaining the corresponding multidimensional of the target topic
Array;Second computing module, it is corresponding for directly calculating target topic described in the corresponding Multidimensional numerical of the target keyword
Correlation index between Multidimensional numerical, alternatively, the computing unit includes: the second acquisition module, for obtaining the target words
The corresponding Multidimensional numerical of each word in topic;Third computing module, for calculating the corresponding Multidimensional numerical of the target keyword
Correlation index between Multidimensional numerical corresponding with word each in the target topic;4th computing module, for by described
Institute is calculated in correlation index between the corresponding Multidimensional numerical of target keyword Multidimensional numerical corresponding with each word
State the correlation index between the corresponding Multidimensional numerical of target keyword Multidimensional numerical corresponding with the target topic.
Further, second determination unit includes: judgment module, for judging the correlation index being calculated
Whether preset condition is met;Determining module, for if it is judged that the correlation index being calculated meets the default item
Part, it is determined that the target topic that the correlation index being calculated meets the preset condition is related to the target keyword
Connection;If it is judged that the correlation index being calculated is unsatisfactory for the preset condition, it is determined that the pass being calculated
The target topic that connection index is unsatisfactory for the preset condition is uncorrelated to the target keyword.
Further, the identification device further include: second acquisition unit, for obtaining before obtaining target keyword
Target text is taken, includes the target topic in the target text;Participle unit, for utilizing participle tool to the mesh
Mark text is segmented, and marks the part of speech of each word in the target text;Third determination unit, for according to preparatory
The part-of-speech rule model of foundation determines the target topic according to the part of speech of the word after participle, and to the target topic into
Line flag;And the 4th determination unit, for determining the corresponding Multidimensional numerical of each word after participle and the target topic pair
The Multidimensional numerical answered.
In the embodiment of the present invention, by obtaining target keyword, determines the corresponding Multidimensional numerical of target keyword, calculate mesh
The correlation index between the corresponding Multidimensional numerical of keyword Multidimensional numerical corresponding with target topic is marked, according to the pass being calculated
Connection index determines topic associated with target keyword, by correlation judgement conversion between target keyword and target topic
For the association between the Multidimensional numerical for indicating the Multidimensional numerical of target keyword attribute and for indicating target topic attribute
The calculating of index, avoid causes the mode of Keywords matching that can not accurately identify topic due to not occurring keyword in topic
The problem of, it solves the problems, such as that the accuracy of topic detection in the prior art is low, has reached the accuracy for improving topic detection
Effect.
Specific embodiment
It should be noted that in the absence of conflict, the features in the embodiments and the embodiments of the present application can phase
Mutually combination.The present invention will be described in detail below with reference to the accompanying drawings and embodiments.
In order to enable those skilled in the art to better understand the solution of the present invention, below in conjunction in the embodiment of the present invention
Attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is only
The embodiment of a part of the invention, instead of all the embodiments.Based on the embodiments of the present invention, ordinary skill people
The model that the present invention protects all should belong in member's every other embodiment obtained without making creative work
It encloses.
It should be noted that description and claims of this specification and term " first " in above-mentioned attached drawing, "
Two " etc. be to be used to distinguish similar objects, without being used to describe a particular order or precedence order.It should be understood that using in this way
Data be interchangeable under appropriate circumstances, so as to the embodiment of the present invention described herein.In addition, term " includes " and " tool
Have " and their any deformation, it is intended that cover it is non-exclusive include, for example, containing a series of steps or units
Process, method, system, product or equipment those of are not necessarily limited to be clearly listed step or unit, but may include without clear
Other step or units listing to Chu or intrinsic for these process, methods, product or equipment.
The embodiment of the invention provides a kind of recognition methods for being associated with topic.
Fig. 1 is the flow chart of the recognition methods of association topic according to an embodiment of the present invention.As shown in Figure 1, the association is talked about
The recognition methods of topic comprises the following steps that
Step S102 obtains target keyword.
Target keyword can be one or more, such as:, college entrance examination etc. in 2014.
Step S104 determines the corresponding Multidimensional numerical of target keyword with machine learning method.Wherein, every in Multidimensional numerical
Dimension array is used to indicate an attribute of target keyword.
Since dimension number each in Multidimensional numerical is used to indicate an attribute of target keyword, then target keyword pair
A unique Multidimensional numerical is answered, that is to say with Multidimensional numerical indicates target keyword.For example, for 500 dimension array representations
It is corresponding unique to obtain target keyword by machine learning method after getting target keyword for target keyword
500 dimension groups.
Step S106 calculates the pass between the corresponding Multidimensional numerical of target keyword Multidimensional numerical corresponding with target topic
Join index.Wherein, correlation index is used to indicate the relevance between target keyword and each target topic, and target topic is
The multiple topics with Multidimensional numerical marked in advance.
Step S108 determines topic associated with target keyword according to the correlation index being calculated.
Each target topic also corresponds to a unique Multidimensional numerical, that is to say the unique Multidimensional numerical of each topic
To indicate, wherein each dimension word indicates an attribute in target topic in the Multidimensional numerical.It should be noted that being used for
Indicate that number of dimensions and the expression number of dimensions of target keyword of the Multidimensional numerical of topic are identical (hereafter similarly), to avoid meter
It calculates wrong.
After determining the corresponding Multidimensional numerical of target keyword, it is corresponding with target topic more to calculate the Multidimensional numerical
Correlation index between dimension group.Due to that there can be multiple topics in text, then the corresponding multidimensional of target keyword is calculated separately
Correlation index between array and the corresponding multidimensional data of multiple topics, thus obtain target keyword and multiple topics it
Between relevance.Finally, determined according to the correlation index being calculated for the associated topic of target keyword, specifically,
Corresponding threshold value can be set, when correlation index is more than the threshold value, then it is assumed that the target topic is associated with target keyword,
Otherwise, then it is assumed that uncorrelated.
In the embodiment of the present invention, by obtaining target keyword, determines the corresponding Multidimensional numerical of target keyword, calculate mesh
The correlation index between the corresponding Multidimensional numerical of keyword Multidimensional numerical corresponding with target topic is marked, according to the pass being calculated
Connection index determines topic associated with target keyword, by correlation judgement conversion between target keyword and target topic
For the association between the Multidimensional numerical for indicating the Multidimensional numerical of target keyword attribute and for indicating target topic attribute
The calculating of index, avoid causes the mode of Keywords matching that can not accurately identify topic due to not occurring keyword in topic
The problem of, it solves the problems, such as that the accuracy of topic detection in the prior art is low, has reached the accuracy for improving topic detection
Effect.
It, can be according to correlation pair after determining for the associated topic of target keyword in the embodiment of the present invention
Topic is ranked up, for example, showing being associated between target keyword and topic if the correlation index being calculated is bigger
Property it is higher, then target topic can be ranked up according to the correlation index being calculated is descending, thus obtain topic pass
Note degree sequencing table.If the correlation index being calculated is smaller, show that the relevance between target keyword and topic is higher,
Target topic can be then ranked up according to correlation index is ascending.
Optionally, the association between the corresponding Multidimensional numerical of target keyword Multidimensional numerical corresponding with target topic is calculated
Index includes: the Euclidean distance calculated between the corresponding Multidimensional numerical of target keyword Multidimensional numerical corresponding with target topic,
Using Euclidean distance as correlation index.
In the embodiment of the present invention, being associated between target keyword and topic is indicated with the Euclidean distance between array
Property, wherein the smaller expression target keyword of Euclidean distance between target keyword and topic and the relevance between topic are got over
It is high;Euclidean distance is bigger to indicate that the higher relevance between target keyword and topic the lower.In this way, according to target topic and closing
It is in the present embodiment, then ascending to target according to Euclidean distance when relevance height between keyword is ranked up topic
Topic is ranked up, and obtains attention rate sequencing table.
In the embodiment of the present invention, judged between target keyword and target topic using the Euclidean distance calculated between array
Relevance, improve the speed of topic detection.
Optionally, the association between the corresponding Multidimensional numerical of target keyword Multidimensional numerical corresponding with target topic is calculated
Index includes: to obtain the corresponding Multidimensional numerical of each word in target topic;Calculate the corresponding Multidimensional numerical of target keyword with
Correlation index in target topic between the corresponding Multidimensional numerical of each word;Due to the corresponding Multidimensional numerical of target keyword with
The corresponding Multidimensional numerical of target keyword is calculated in correlation index between the corresponding Multidimensional numerical of each word and target is talked about
Inscribe the correlation index between corresponding Multidimensional numerical.
It is made of word according to certain grammer due to topic, includes multiple words in topic, use machine learning
When method calculates the corresponding Multidimensional numerical of target keyword and target topic corresponding Multidimensional numerical, first to calculate in target topic
The Multidimensional numerical of each word, the correlation index between Multidimensional numerical corresponding with target topic, which can be, calculates separately target words
Correlation index in topic between each corresponding Multidimensional numerical of word and array corresponding with target keyword, then referred to by the association
Number obtains the relevance of target keyword Yu target topic.For example, calculating separately the corresponding multidimensional of each word in target topic
It is corresponding that target keyword is calculated by the Euclidean distance in Euclidean distance between array and array corresponding with target keyword
Multidimensional numerical Multidimensional numerical corresponding with target topic between correlation index.In this way, by word each in topic with
Relevance between target keyword determines the relevance between target topic and target keyword, further increases topic pair
The accuracy for answering array to calculate, and then ensure the accuracy of topic detection.
Optionally, the association between the corresponding Multidimensional numerical of target keyword Multidimensional numerical corresponding with target topic is calculated
Index includes: to obtain the corresponding Multidimensional numerical of target topic;Directly calculate the corresponding Multidimensional numerical target topic of target keyword
Correlation index between corresponding Multidimensional numerical.
Since topic is made of multiple words, machine first can be passed through according to the corresponding Multidimensional numerical of word each in topic
Study obtains the corresponding Multidimensional numerical of topic.So, when calculating correlation index, available target topic first passes through engineering in advance
Then it is corresponding with target topic more directly to calculate the corresponding Multidimensional numerical of target keyword for unique Multidimensional numerical that acquistion is arrived
Correlation index between dimension group.For the correlation index by calculating target keyword and each word in topic,
Substantially increase the speed of the correlation index calculating of target keyword and topic.
Preferably, determine to include: judgement meter with the associated topic of target keyword according to the correlation index being calculated
Whether obtained correlation index meets preset condition;If it is judged that the correlation index being calculated meets preset condition, then
The target topic that the determining correlation index being calculated meets preset condition is associated with target keyword;If it is judged that calculating
Obtained correlation index is unsatisfactory for preset condition, it is determined that the correlation index being calculated is unsatisfactory for the target topic of preset condition
It is uncorrelated to target keyword.
In the present embodiment, preset condition can be preset threshold, for example, then showing target topic when correlation index is bigger
Between target keyword to be associated with performance higher, then, judging whether the correlation index being calculated meets preset condition can
To be whether the correlation index that judgement is calculated is more than preset threshold, if it exceeds, it is determined that topic and target keyword phase
Association, conversely, then uncorrelated.
If it is default to judge whether the correlation index being calculated meets for Euclidean distance of the correlation index between array
Condition, which can be, judges whether Euclidean distance is less than preset threshold, if it is, determining that topic is associated with target keyword, instead
It, then it is uncorrelated.
By be arranged preset condition, quickly determined out from the result being calculated it is related to target keyword if
Topic, to improve the accuracy of topic detection.
Preferably, before obtaining target keyword, recognition methods further include: obtain target text, wrapped in target text
Contain target topic;Target text is segmented using participle tool, and marks the part of speech of each word in target text;
Target topic is determined according to the part of speech of the word after participle according to the part-of-speech rule model pre-established, and to target topic into
Line flag;The corresponding Multidimensional numerical of each word and the corresponding Multidimensional numerical of target topic after determining participle.
Acquisition includes the target text of topic, establishes text training set, and text word segmentation regulation is set as needed;With
The part-of-speech rule model (such as noun+verb or noun+verb+object) of semantic analysis construction topic;Utilize participle work
Have (including the text word segmentation regulation of setting) and carry out text analyzing, and marks all parts of speech of each word, while marking topic;
It is respectively indicated all words (including topic) with Multidimensional numerical, such as 500 dimensions, each word is obtained by machine learning method
Corresponding unique Multidimensional numerical.In this way, after getting target keyword and determining the Multidimensional numerical of target keyword, Ke Yizhi
It connects Multidimensional numerical corresponding with topic and correlation index such as Euclidean distance is calculated.
In the embodiment of the present invention, topic is defined by part-of-speech rule model, obtains each word and words with machine learning method
Corresponding array is inscribed, so that topic relevance judgement is converted into the calculating of correlation index between array, greatly improves related words
Inscribe the speed and accuracy of identification.
The embodiment of the invention also provides a kind of identification devices for being associated with topic.The device can pass through computer equipment reality
Its existing function.It should be noted that the identification device of the association topic of the embodiment of the present invention can be used for executing implementation of the present invention
The recognition methods of association topic provided by example, the recognition methods of the association topic of the embodiment of the present invention can also be through the invention
The identification device of topic is associated with provided by embodiment to execute.
Fig. 2 is the schematic diagram of the identification device of association topic according to an embodiment of the present invention.As shown in Fig. 2, the association is talked about
The identification device of topic includes: first acquisition unit 10, the first determination unit 20, computing unit 30 and the second determination unit 40.
First acquisition unit 10 is for obtaining target keyword.
Target keyword can be one or more, such as:, college entrance examination etc. in 2014.
First determination unit 20 is used to determine the corresponding Multidimensional numerical of target keyword with machine learning method, wherein more
Each dimension number is used to indicate an attribute of target keyword in dimension group.
Since dimension number each in Multidimensional numerical is used to indicate an attribute of target keyword, then target keyword pair
A unique Multidimensional numerical is answered, that is to say with Multidimensional numerical indicates target keyword.For example, for 500 dimension array representations
It is corresponding only can to obtain target keyword by machine learning method after getting target keyword for target keyword
One 500 dimension groups.
Computing unit 30 for calculate the corresponding Multidimensional numerical of target keyword Multidimensional numerical corresponding with target topic it
Between correlation index, wherein correlation index is used to indicate relevance between target keyword and target topic, and target topic is
The multiple topics with Multidimensional numerical marked in advance.
Second determination unit 40 be used to be determined according to the correlation index that is calculated it is associated with target keyword if
Topic.
Each target topic also corresponds to a unique Multidimensional numerical, that is to say the unique Multidimensional numerical of each topic
To indicate, wherein each dimension word indicates an attribute in target topic in the Multidimensional numerical.It should be noted that being used for
Indicate that number of dimensions and the expression number of dimensions of target keyword of the Multidimensional numerical of topic are identical (hereafter similarly), to avoid meter
It calculates wrong.
After determining the corresponding Multidimensional numerical of target keyword, it is corresponding with target topic more to calculate the Multidimensional numerical
Correlation index between dimension group.Due to that there can be multiple topics in text, then the corresponding multidimensional of target keyword is calculated separately
Correlation index between array and the corresponding multidimensional data of multiple topics, thus obtain target keyword and multiple topics it
Between relevance.Finally, determined according to the correlation index being calculated for the associated topic of target keyword, specifically,
Corresponding threshold value can be set, when correlation index is more than the threshold value, then it is assumed that the target topic is associated with target keyword,
Otherwise, then it is assumed that uncorrelated.
In the embodiment of the present invention, by obtaining target keyword, determines the corresponding Multidimensional numerical of target keyword, calculate mesh
The correlation index between the corresponding Multidimensional numerical of keyword Multidimensional numerical corresponding with target topic is marked, according to the pass being calculated
Connection index determines topic associated with target keyword, by correlation judgement conversion between target keyword and target topic
For the association between the Multidimensional numerical for indicating the Multidimensional numerical of target keyword attribute and for indicating target topic attribute
The calculating of index, avoid causes the mode of Keywords matching that can not accurately identify topic due to not occurring keyword in topic
The problem of, it solves the problems, such as that the accuracy of topic detection in the prior art is low, has reached the accuracy for improving topic detection
Effect.
It, can be according to correlation pair after determining for the associated topic of target keyword in the embodiment of the present invention
Topic is ranked up, for example, showing being associated between target keyword and topic if the correlation index being calculated is bigger
Property it is higher, then target topic can be ranked up according to the correlation index being calculated is descending, thus obtain topic pass
Note degree sequencing table.If the correlation index being calculated is smaller, show that the relevance between target keyword and topic is higher,
Target topic can be then ranked up according to correlation index is ascending.
Preferably, computing unit includes: the first computing module, for calculating the corresponding Multidimensional numerical of target keyword and mesh
The Euclidean distance between the corresponding Multidimensional numerical of topic is marked, using Euclidean distance as correlation index.
In the embodiment of the present invention, being associated between target keyword and topic is indicated with the Euclidean distance between array
Property, wherein the smaller expression target keyword of Euclidean distance between target keyword and topic and the relevance between topic are got over
It is high;Euclidean distance is bigger to indicate that the higher relevance between target keyword and topic the lower.In this way, according to target topic and closing
It is in the present embodiment, then ascending to target according to Euclidean distance when relevance height between keyword is ranked up topic
Topic is ranked up, and obtains attention rate sequencing table.
In the embodiment of the present invention, judged between target keyword and target topic using the Euclidean distance calculated between array
Relevance, improve the speed of topic detection.
Preferably, computing unit includes: the second acquisition module, for obtaining the corresponding multidimensional of each word in target topic
Array;Third computing module, it is corresponding with each word in target topic for calculating the corresponding Multidimensional numerical of target keyword
Correlation index between Multidimensional numerical;4th computing module, for by the corresponding Multidimensional numerical of target keyword and each word
It is corresponding with target topic that the corresponding Multidimensional numerical of target keyword is calculated in correlation index between corresponding Multidimensional numerical
Correlation index between Multidimensional numerical.
It is made of word according to certain grammer due to topic, includes multiple words in topic, use machine learning
When method calculates the corresponding Multidimensional numerical of target keyword and target topic corresponding Multidimensional numerical, first to calculate in target topic
The Multidimensional numerical of each word, the correlation index between Multidimensional numerical corresponding with target topic, which can be, calculates separately target words
Correlation index in topic between each corresponding Multidimensional numerical of word and array corresponding with target keyword, then referred to by the association
Number obtains the relevance of target keyword Yu target topic.For example, calculating separately the corresponding multidimensional of each word in target topic
It is corresponding that target keyword is calculated by the Euclidean distance in Euclidean distance between array and array corresponding with target keyword
Multidimensional numerical Multidimensional numerical corresponding with target topic between correlation index.In this way, by word each in topic with
Relevance between target keyword determines the relevance between target topic and target keyword, further increases topic pair
The accuracy for answering array to calculate, and then ensure the accuracy of topic detection.
Optionally, computing unit includes: the first acquisition module, for obtaining the corresponding Multidimensional numerical of target topic;Second
Computing module, for directly calculating the association between the corresponding Multidimensional numerical of the corresponding Multidimensional numerical target topic of target keyword
Index.
Since topic is made of multiple words, machine first can be passed through according to the corresponding Multidimensional numerical of word each in topic
Study obtains the corresponding Multidimensional numerical of topic.So, when calculating correlation index, available target topic first passes through engineering in advance
Then it is corresponding with target topic more directly to calculate the corresponding Multidimensional numerical of target keyword for unique Multidimensional numerical that acquistion is arrived
Correlation index between dimension group.For the correlation index by calculating target keyword and each word in topic,
Substantially increase the speed of the correlation index calculating of target keyword and topic.
Preferably, the second determination unit includes: judgment module, for judging it is pre- whether the correlation index being calculated meets
If condition;Determining module, for if it is judged that the correlation index being calculated meets preset condition, it is determined that be calculated
The target topic that correlation index meets preset condition is associated with target keyword;If it is judged that the correlation index being calculated
It is unsatisfactory for preset condition, it is determined that the target topic and target keyword that the correlation index being calculated is unsatisfactory for preset condition are not
It is related.
In the present embodiment, preset condition can be preset threshold, for example, then showing target topic when correlation index is bigger
Between target keyword to be associated with performance higher, then, judging whether the correlation index being calculated meets preset condition can
To be whether the correlation index that judgement is calculated is more than preset threshold, if it exceeds, it is determined that topic and target keyword phase
Association, conversely, then uncorrelated.
If it is default to judge whether the correlation index being calculated meets for Euclidean distance of the correlation index between array
Condition, which can be, judges whether Euclidean distance is less than preset threshold, if it is, determining that topic is associated with target keyword, instead
It, then it is uncorrelated.
By be arranged preset condition, quickly determined out from the result being calculated it is related to target keyword if
Topic, to improve the accuracy of topic detection.
Preferably, identification device further include: second acquisition unit, for obtaining target before obtaining target keyword
Text includes target topic in target text;Participle unit, for being segmented using participle tool to target text, and
Mark the part of speech of each word in target text;Third determination unit, for according to the part-of-speech rule model root pre-established
Target topic is determined according to the part of speech of the word after participle, and target topic is marked;And the 4th determination unit, it is used for
The corresponding Multidimensional numerical of each word and the corresponding Multidimensional numerical of target topic after determining participle.
Acquisition includes the target text of topic, establishes text training set, and text word segmentation regulation is set as needed;With
The part-of-speech rule model (such as noun+verb or noun+verb+object) of semantic analysis construction topic;Utilize participle work
Have (including the text word segmentation regulation of setting) and carry out text analyzing, and marks all parts of speech of each word, while marking topic;
It is respectively indicated all words (including topic) with Multidimensional numerical, such as 500 dimensions, each word is obtained by machine learning method
Corresponding unique Multidimensional numerical.In this way, after getting target keyword and determining the Multidimensional numerical of target keyword, Ke Yizhi
It connects Multidimensional numerical corresponding with topic and correlation index such as Euclidean distance is calculated.
In the embodiment of the present invention, topic is defined by part-of-speech rule model, obtains each word and words with machine learning method
Corresponding array is inscribed, so that topic relevance judgement is converted into the calculating of correlation index between array, greatly improves related words
Inscribe the speed and accuracy of identification.
It should be noted that for the various method embodiments described above, for simple description, therefore, it is stated as a series of
Combination of actions, but those skilled in the art should understand that, the present invention is not limited by the sequence of acts described because
According to the present invention, some steps may be performed in other sequences or simultaneously.Secondly, those skilled in the art should also know
It knows, the embodiments described in the specification are all preferred embodiments, and related actions and modules is not necessarily of the invention
It is necessary.
In the above-described embodiments, it all emphasizes particularly on different fields to the description of each embodiment, there is no the portion being described in detail in some embodiment
Point, reference can be made to the related descriptions of other embodiments.
In several embodiments provided herein, it should be understood that disclosed device, it can be by another way
It realizes.For example, the apparatus embodiments described above are merely exemplary, such as the division of the unit, it is only a kind of
Logical function partition, there may be another division manner in actual implementation, such as multiple units or components can combine or can
To be integrated into another system, or some features can be ignored or not executed.Another point, shown or discussed is mutual
Coupling, direct-coupling or communication connection can be through some interfaces, the indirect coupling or communication connection of device or unit,
It can be electrical or other forms.
The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit
The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple
In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme
's.
It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit
It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list
Member both can take the form of hardware realization, can also realize in the form of software functional units.
If the integrated unit is realized in the form of SFU software functional unit and sells or use as independent product
When, it can store in a computer readable storage medium.Based on this understanding, technical solution of the present invention is substantially
The all or part of the part that contributes to existing technology or the technical solution can be in the form of software products in other words
It embodies, which is stored in a storage medium, including some instructions are used so that a computer
Equipment (can be personal computer, mobile terminal, server or network equipment etc.) executes side described in each embodiment of the present invention
The all or part of the steps of method.And storage medium above-mentioned include: USB flash disk, read-only memory (ROM, Read-Only Memory),
Random access memory (RAM, Random Access Memory), mobile hard disk, magnetic or disk etc. are various to be can store
The medium of program code.
The foregoing is only a preferred embodiment of the present invention, is not intended to restrict the invention, for the skill of this field
For art personnel, the invention may be variously modified and varied.All within the spirits and principles of the present invention, made any to repair
Change, equivalent replacement, improvement etc., should all be included in the protection scope of the present invention.