CN107832312A - A kind of text based on deep semantic discrimination recommends method - Google Patents

A kind of text based on deep semantic discrimination recommends method Download PDF

Info

Publication number
CN107832312A
CN107832312A CN201710000406.3A CN201710000406A CN107832312A CN 107832312 A CN107832312 A CN 107832312A CN 201710000406 A CN201710000406 A CN 201710000406A CN 107832312 A CN107832312 A CN 107832312A
Authority
CN
China
Prior art keywords
theme
semantic
user
grid
mrow
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710000406.3A
Other languages
Chinese (zh)
Other versions
CN107832312B (en
Inventor
郐弘智
陈建辉
盛文瑾
闫健卓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Technology
Original Assignee
Beijing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Technology filed Critical Beijing University of Technology
Priority to CN201710000406.3A priority Critical patent/CN107832312B/en
Publication of CN107832312A publication Critical patent/CN107832312A/en
Application granted granted Critical
Publication of CN107832312B publication Critical patent/CN107832312B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses the text based on deep semantic discrimination to recommend method, text subject is extracted according to deep semantic grid model automatically, it is semantic according to scene of the theme scene Semantic Discrimination method reasoning under different text backgrounds, the text subject tree of fusion scene state is realized, is that every document constructs user version interest portrait according to the real-time scene state of user.The real-time fluctuations of user situation state are directed at inquiry end, the semantic screening of scene is carried out to text subject tree, inquiry content is carried out to inquire about interest topic modeling, secondary potential applications reasoning is carried out to user's direct interest theme according to activation method of diffusion, the global activation value of theme is calculated, the semantic user of the structure fusion situation of presence inquires about interest portrait.It is that document is scored by similarity calculating method, according to scoring height generation text recommendation list.

Description

A kind of text based on deep semantic discrimination recommends method
Technical field
The present invention relates to recommended technology field, is related to a kind of text based on deep semantic discrimination and recommends method, especially relates to And a kind of deep semantic grid model and text subject scene Semantic Discrimination based on class brain " layering-diverging " thinking mode construction Recommendation method.
Background technology
Commending system is suggested in the nineties in last century, and the commending system of early stage focuses mainly on the form phase of retrieval result Like property, and retrieval result and the semantic dependency of inquiry are have ignored, cause the noise of recommendation results very big.In recent years, with nothing The explosive growth of paper data, the validity problem of information retrieval cause the extensive concern of researcher, propose a variety of bases In the information retrieval method of semanteme.In terms of personalized semantic recommendation, formal semantics and the class of social semantics two are broadly divided into Method.
Social semantics method is on the one hand by analyzing user journal, user tag, field popularity and user activity Etc. information, user's nude picture is built, reaches the effect of personalized recommendation;On the other hand it is similar with project based on user's similitude The method of property, approaches scoring of the targeted customer to the project by scoring of the most like some users to some project and reaches and push away Recommend effect, such as collaborative filtering method.The interest correlation of retrieval result is former improved, but needs to analyze substantial amounts of user Behavioral data, it is clear that the data of most of user do not reach this requirement, meanwhile, the essence of this method is interest keyword The form matches, lack the ability of semantic analysis and potential interest digging;Although the latter's more hommization, and there is stronger digging The ability of potential interest document is dug, but because the result complexity of feedback is various, it is largely uncorrelated to inquiry to instead result in appearance Content.Meanwhile with the continuous expansion of data recommendation dimension, cold start-up problem caused by Deta sparseness, particularly one When individual new user or a series of frontier documents and materials enter system, due to not enough information supports so that recommend effect Fruit declines.
Formal semantics commending system largely uses the semantic query technology based on body.This mode is by document information Conceptual level is abstracted into, is linked together between concept and concept using different semantic relations, forms a species brain thoughtcast Network structure.Because this method directly operates from conceptual level to text, and the overwhelming majority is applied to structural knowledge storehouse Retrieval, so result semantic dependency improve it is fairly obvious.But when being recommended using these methods text, Consider that the scene that implies in the text of concept is semantic, cause document to there is semantic ambiguity during Ontology Mapping Situation.Therefore, prior art has yet to be improved and developed.
The content of the invention
In view of the deficiencies in the prior art, the invention provides a kind of text based on deep semantic discrimination to recommend method, Aim to solve the problem that the problem of existing recommendation method semantic dependency has much room for improvement.
In order to solve the above technical problems, the technical solution used in the present invention specifically comprises the following steps:
Step 1:Mode construction deep semantic grid model is thought deeply based on class brain " layering-diverging ";
Step 2:With reference to " grid theme-synonymous bag of words " model and the grid theme collection of word matching technique reasoning text, its It is secondary, scattered theme is coupled using " association-memory " function of grid model, then, pushed away using scene semantic analysis function Not scene label of the coactivation theme under current text is managed, finally, structure merges a variety of scenes semantemes and remembers connection Text subject tree;
Step 3:Beta pruning processing is carried out to text subject tree according to user interest, that is, filters out and does not meet user's situation of presence The theme and relation of state, so as to build the text subject tree based on the semantic screening of scene;
Step 4:Utilize all text subject trees after the semantic screening of scene in TF-IDF algorithm staqtistical data bases, meter Calculate the weighted value of theme and be mapped in corresponding grid theme node, so as to construct user version interest picture for every document Picture;
Step 5:Extracted according to pseudo-linear filter method and inquire about the related document of content and corresponding scene with user Text subject tree after semanteme screening, the frequency of theme and do normalized during statistics feedback is set and obtain initial interest topic and swash Value living;
Step 6:Initial interest grid theme and potential interest grid master under feedback learning are calculated using spreading activation account The global dynamic activation value of topic, result of calculation is assigned in grid model corresponding theme node, the structure fusion situation of presence Semantic user inquires about interest portrait;
Step 7:Using the cosine similarity computational methods based on grid, inquire about interest portrait for user and user version is emerging The deep semantic degree of correlation between interest portrait is scored, and is generated recommendation list and recommended.
Further, the deep semantic grid model described in step 1 of the present invention is according to class brain " layering-diverging " thinking mould The construction method of formula, step 1 building process specifically include:
Step 1-1, the classification body with multi-field fusion is chosen, utilizes the natural language processing work of Stanford universities Tool theme in body is made it is semantic split and part of speech reduction treatment obtains core subject collection, according to the memory characteristic of body by core Theme connects into the grid model of diverging;
Step 1-2, " grid theme-synonymous bag of words " Semantic mapping model is built, " theme " is represented in graded mesh model Core subject, " bag of words " are combined into by extracting synonymous term collection of the above-mentioned theme in WordNet dictionaries.If " theme-word Term occurs in the text in bag " model, then the theme is activated and is arranged to corresponding grid node attribute " 1 ", realizes Text shallow semantic Topics Crawling function;
Step 1-3, " theme-label-summary " triple in DBpedia knowledge bases is traveled through, by theme in triple and " master Term, which is matched and extracted, in topic-bag of words " model matches label and summary data corresponding to theme in knowledge base, by " grid Theme-DBpedia themes-label-summary " successively map, and are associated with semantic dependency relation type;
Step 1-4, using " layering-memory " grid model as skeleton, realize " synonymous bag of words-grid theme-DBpedia masters " diverging-deep layer " semantic model of topic-label-summary " fusion.
Further, the theme scene Semantic Discrimination side based on DBpedia knowledge bases is employed in step 2 of the invention Method, the theme scene Semantic Discrimination method specifically include:
The first step, generate the term set of context after activation theme s dynamic span adding windows in document, Keys
Second step, generate the term made a summary under the different scene label m activated in DBpedia knowledge bases corresponding to theme s Set, Tm,s;Count the summary term number under scene label, Nm
3rd step, theme scene semantic similarity is calculated according to below equation:
Wherein counter (Tm,s,Keys) represent set Tm,sWith KeysMiddle term The co-occurrence frequency.
4th step, the scene for choosing the corresponding scene label of maximum relation degree summary as document activation theme s are semantic State, form " text-activation theme-scene label " triple.
Further, comprising the following steps that for user version interest topic portrait is built in step 4 of the present invention:
The first step, count the theme frequency in database in all text subject trees under situation of presence pattern;
Second step, calculate the theme frequency TF and inverse document frequency IDF of every document, wherein TF=CM/RNRepresent current to use Under the interest contextual model of family in every document activate theme the frequency and current document in activate the total word frequency of theme ratio;IDF =log (S/N) is number of files ratio of total number of files with including activation theme under current user interest scene state in database, Result after value of taking the logarithm again;
3rd step, calculate the interest topic semantic weight C of fusion user feeling Semantic Discriminationw,i, it is calculated as follows:
Cw,i=TFi*IDFi(i=1,2 ..., n),
The theme semantic weight of every document is mapped in grid subject attribute unit group, builds user version interest master Topic portrait.
Further, the user used in step 6 of the present invention inquires about comprising the following steps that for interest portrait:
The first step, feedback document and corresponding document subject matter tree are obtained according to pseudo-linear filter principle;
Second step, according to the scene state that user currently sets to original text shelves subject tree carry out topic distillation, screen out with The incoherent theme of user's situation of presence state, leave user's subject tree interested;Each master in counting user interest topic tree Inscribe the frequency occurred and do normalized as the initial interest activation value of user, activation value is mapped to grid theme node In attribute tags;
3rd step, according to the relationship type between each theme node in grid model, initial activation interest topic is carried out Semantic-enabled spreads, and excavates the potential interest topic node under user's situation of presence state, and calculate its global activation value;
Grid diffusion formula is:
Wherein, θijBe the theme in grid model it is all using theme node j as purpose node and with the theme related to node j Node i be source node theme set of paths, Ii(t) it is the activation property value of each potential theme node in t grid model, Oj(t+1) it is the activation property value of global theme node in t+1 moment grid models, wijWorking as activation theme and potential theme Association's relating value under preceding scene state, α are decay factor, are arranged to 0.75, association's path length is arranged to 3.
4th step, theme overall situation activation value is mapped in grid subject attribute unit group, structure user inquires about interest master Topic portrait.
Further, in step 7 of the invention using cosine similarity formula calculate user version interest grid portrait with User inquires about language " domain " degree of correlation of interest grid portrait, and formula represents as follows:
Wherein,Drawn a portrait for user version interest grid, q={ o1,o2,…,onIt is to use Family inquiry interest grid portrait.
The present invention can be applied to all commending systems based on text retrieval, and its advantage is as follows:
1. the present invention is in user terminal, the inquiry content submitted in face of user, using based on the once anti-of scene Semantic Discrimination Theme learning method and the topic expansion method of secondary semantic-enabled diffusion are presented, solves user's query semantics degree of correlation and potential Interest digging problem;
2. the present invention is at document end, according to deep semantic grid model automated reasoning document subject matter and the scene language of theme Adopted characteristic, realize that text subject extracts and deep layer interest semantic data mining duty automatically.
Brief description of the drawings
Fig. 1 is the flow chart that a kind of text based on deep semantic discrimination of the present invention recommends method preferred embodiment.
Fig. 2 is the particular flow sheet of step S100 in method shown in Fig. 1.
Fig. 3 is the particular flow sheet of step S102 in method shown in Fig. 1.
Fig. 4 is the particular flow sheet of step S103 in method shown in Fig. 1.
Fig. 5 is the particular flow sheet of step S104 in method shown in Fig. 1.
Fig. 6 is under user's difference interest content input condition, and deep semantic recommends method to exist with the semantic recommendation method of tradition Contrast in system sequence point (RS).
Embodiment
The invention provides a kind of text based on deep semantic discrimination to recommend method, below in conjunction with accompanying drawing and specific implementation Example is described in further detail to the present invention.
Fig. 1 is the flow chart that a kind of text based on deep semantic discrimination of the present invention recommends method preferred embodiment, is such as schemed Shown, implementation step is:
A kind of deep semantic grid model based on class brain " layering-diverging " thinking pattern of S100, structure;
S101, user input content interested and set current scene state;
S102, theme reasoning and theme scene Semantic Discrimination are carried out to text, build text subject tree, and according to current The scene state of user carries out the semantic screening of theme to document subject matter tree, so as to build the text subject of the semantic screening of fusion scene Tree;
S103, utilize all text subject trees after the semantic screening of scene in TF-IDF algorithm staqtistical data bases, meter Calculate the weighted value of theme and be mapped in corresponding grid theme node, so as to construct user version interest picture for every document Picture;
S104, extracted according to pseudo-linear filter method and inquire about the related document of content and corresponding scene language with user Text subject tree after justice screening, the frequency of theme and do normalized and obtain initial interest topic and activate in statistics feedback tree Value;The global dynamic of initial interest grid theme and potential interest grid theme under feedback learning is calculated using spreading activation account Activation value, result of calculation is assigned in grid model corresponding theme node, the semantic user of the structure fusion situation of presence looks into Ask interest portrait;
S105, calculated under situation of presence pattern by cosine similarity algorithm based on grid user version interest portrait with User inquires about the semantic similarity of interest portrait and scored;
S106, according to the degree of correlation of model scoring it is descending be ranked up, generate recommendation list, for user recommend sense it is emerging Interesting article shelves.
Further, as shown in Fig. 2 the step S100 is specifically included:
S001, the classification body with multi-field fusion is chosen, utilize the natural language processing instrument of Stanford universities Theme in body is made it is semantic split and part of speech reduction treatment obtains core subject collection, according to the memory characteristic of body by core master Topic connects into the grid model of diverging;
S002, structure " grid theme-synonymous bag of words " Semantic mapping model, " theme " represents core in graded mesh model Theme, " bag of words " are combined into by extracting synonymous term collection of the above-mentioned theme in WordNet dictionaries;
S003, traversal DBpedia knowledge bases in " theme-label-summary " triple, by theme in triple with " theme- Term is matched in bag of words " model, and will be mapped between grid theme and DBpedia themes, with semantic dependency relation Type is associated;
S004, label and summary data corresponding to matching theme in DBpedia knowledge bases are extracted, with " layering-memory " net Lattice model is skeleton, realizes " diverging-deep layer " language of " synonymous bag of words-grid theme-DBpedia themes-label-summary " fusion Adopted grid model.
Further, as shown in figure 3, the step S102 is specifically included:
S201, the semantic relevance using " theme-bag of words " in grid model, Keywords matching is carried out to text terms, If term occurs in the text in bag of words, the theme is activated and is arranged to corresponding grid node attribute " 1 ", realizes text This shallow semantic Topics Crawling function;
S202, " association-memory " characteristic according to deep semantic grid model, scattered theme is built into text subject Tree;
S203, scene state discrimination is carried out to document subject matter, comprised the following steps that:
The first step, generate the term set of context after activation theme s dynamic span adding windows in document, Keys
Second step, generate the term made a summary under the different scene label m activated in DBpedia knowledge bases corresponding to theme s Set, Tm,s;Count the summary term number under scene label, Nm
3rd step, theme scene semantic similarity is calculated according to below equation:
Wherein counter (Tm,s,Keys) represent set Tm,sWith KeysMiddle term The co-occurrence frequency;
4th step, the scene for choosing the corresponding scene label of maximum relation degree summary as document activation theme s are semantic State, form " text-activation theme-scene label " triple.
Further, as shown in figure 4, the step S103 is specifically included:
The theme frequency under S301, statistics situation of presence pattern in database in all text subject trees;
S302, the theme frequency TF and inverse document frequency IDF for calculating every document, wherein TF=CM/RNRepresent active user Under interest contextual model in every document activate theme the frequency and current document in activate the total word frequency of theme ratio;IDF= Log (S/N) is number of files ratio of total number of files with including activation theme under current user interest scene state in database, then Result after value of taking the logarithm;
S303, the interest topic semantic weight C for calculating fusion user feeling Semantic Discriminationw,i, it is calculated as follows:
Cw,i=TFi*IDFi(i=1,2 ..., n),
The theme semantic weight of every document is mapped in grid subject attribute unit group, builds user version interest master Topic portrait.
Further, as shown in figure 5, the step S104 is specifically included:
S401, obtained according to pseudo-linear filter principle and feed back document and corresponding document subject matter tree;
S402, according to the scene state that user currently sets to original text shelves subject tree carry out topic distillation, screen out and use The incoherent theme of family situation of presence state, leaves user's subject tree interested;Each theme in counting user interest topic tree The frequency of appearance simultaneously does normalized as the initial interest activation value of user, and activation value is mapped to the category of grid theme node In property label;
S403, according to the relationship type between each theme node in grid model, language is carried out to initial activation interest topic Justice activation diffusion, excavates the potential interest topic node under user's situation of presence state, and calculate its global activation value;
Grid diffusion formula is:
Wherein, θijBe the theme in grid model it is all using theme node j as purpose node and with the theme related to node j Node i be source node theme set of paths, Ii(t) it is the activation property value of each potential theme node in t grid model, Oj(t+1) it is the activation property value of global theme node in t+1 moment grid models, wijWorking as activation theme and potential theme Association's relating value under preceding scene state, α are decay factor, are arranged to 0.75, association's path length is arranged to 3.
S404, theme overall situation activation value is mapped in grid subject attribute unit group, structure user inquires about interest topic Portrait.
Further, according to the step S105, using cosine similarity formula calculate user version interest grid portrait with User inquires about language " domain " degree of correlation of interest grid portrait, and formula represents as follows:
Wherein,Drawn a portrait for user version interest grid, q={ o1,o2,…,onIt is to use Family inquiry interest grid portrait.
The present invention recommends in the inquiry of user with application scenario Semantic Discrimination technology in document subject matter learning process to improve The correlation of document, and then the more recommendation document of wisdom, can effectively reduce similar but uncorrelated document to recommendation results Influence, lift the semantic dependency of commending system, and then find out user's real personal interest institute to lifting commending system Accuracy and the ability of personalized discrimination.
Method is recommended to compare with the semantic recommendation method of tradition in the text based on deep semantic discrimination of the present invention below Compared with checking, experiment parameter is chosen as follows:Emulation data set chooses the document data of 2005 in PubMed databases, wherein wrapping The abstract of a thesis of more than 26000 biomedical aspect is contained.Deep Semantics grid model is by ACM Digital Library full text Body and DBpedia construction of knowledge base in database.Text processing facilities are carried using Stanford University's natural language processing group A series of Java text analyzing instruments increased income supplied.
Influence of the checking present invention to the commending system sequence degree of accuracy, experimental result are as follows:
Fig. 6 is under user's difference interest content input condition, and the semantic recommendation method of tradition recommends method to exist with deep semantic Contrast in system sequence point (RS).Wherein, the semantic recommendation method of tradition represents the shallow semantic recommendation of no scene Semantic Discrimination Method, deep semantic recommend method to represent method proposed by the present invention;By Fig. 6 it can be seen that, under 5 experimental conditions, this The ordering score of invention is always below the semantic method recommended of tradition.Because ordering score is smaller, explanation system is more intended to handle Before the commodity that user likes come, therefore, experimental result illustrates that method proposed by the present invention has more preferable recommendation effect.
It should be noted that protection scope of the present invention includes but is not limited to above-mentioned citing, to ordinary skill For personnel, any improvement or conversion that carry out according to the above description should all be fallen within the scope of the invention.

Claims (6)

1. a kind of text based on deep semantic discrimination recommends method, it is characterised in that it is as follows that the text recommends method to include Step:
Step 1:Mode construction deep semantic grid model is thought deeply based on class brain " layering-diverging ";
Step 2:With reference to " grid theme-synonymous bag of words " model and the grid theme collection of word matching technique reasoning text, net is utilized " association-memory " function of lattice model will be scattered theme be coupled, then utilize scene semantic analysis functional reasoning not coactivation Scene label of the theme under current text, the text subject tree that finally structure merges a variety of scenes semantemes and memory is coupled;
Step 3:Beta pruning processing is carried out to text subject tree according to user interest, that is, filters out and does not meet user's situation of presence state Theme and relation, so as to build the text subject tree based on the semantic screening of scene;
Step 4:Using all text subject trees after the semantic screening of scene in TF-IDF algorithm staqtistical data bases, master is calculated The weighted value of topic is simultaneously mapped in corresponding grid theme node, and user version interest portrait is constructed for every document;
Step 5:Extracted according to pseudo-linear filter method and inquire about the related document of content and corresponding scene semanteme with user Text subject tree after screening, the frequency of theme and do normalized and obtain the activation of initial interest topic in statistics feedback tree Value;
Step 6:Initial interest grid theme and potential interest grid theme under feedback learning are calculated using spreading activation account Global dynamic activation value, result of calculation is assigned in grid model corresponding theme node, the structure fusion situation of presence is semantic User inquire about interest portrait;
Step 7:Using the cosine similarity computational methods based on grid, inquire about interest portrait for user and user version interest is drawn The deep semantic degree of correlation as between is scored, and is generated recommendation list and recommended.
2. a kind of text based on deep semantic discrimination as claimed in claim 1 recommends method, it is characterised in that in step 1 Described deep semantic grid model is built according to class brain " layering-diverging " thoughtcast, and the building process specifically wraps Include:
The first step, the classification body with multi-field fusion is chosen, utilizes the natural language processing instrument pair of Stanford universities In body theme make it is semantic split and part of speech reduction treatment obtains core subject collection, according to the memory characteristic of body by core subject Connect into the grid model of diverging;
Second step, " grid theme-synonymous bag of words " Semantic mapping model is built, " theme " represents core master in graded mesh model Topic, " bag of words " are combined into by extracting synonymous term collection of the above-mentioned theme in WordNet dictionaries." if theme-bag of words " mould Term occurs in the text in type, then the theme is activated and corresponding grid node attribute is arranged into " 1 ", realizes that text is shallow Layer semantic topic data mining duty;
3rd step, " theme-label-summary " triple in DBpedia knowledge bases is traveled through, by theme in triple and " theme-word Term, which is matched and extracted, in bag " model matches label and summary data corresponding to theme in knowledge base, will " grid theme- DBpedia themes-label-summary " successively map, and are associated with semantic dependency relation type;
4th step, using " layering-memory " grid model as skeleton, realize " synonymous bag of words-grid theme-DBpedia themes-mark " diverging-deep layer " semantic model of label-summary " fusion.
3. a kind of text based on deep semantic discrimination as claimed in claim 1 recommends method, it is characterised in that in step 2 The theme scene Semantic Discrimination method based on DBpedia knowledge bases is employed, the Semantic Discrimination method comprises the following steps that:
The first step, generate the term set of context after activation theme s dynamic span adding windows in document, Keys
Second step, the term set made a summary under the different scene label m activated in DBpedia knowledge bases corresponding to theme s is generated, TM, s;Count the summary term number under scene label, Nm
3rd step, theme scene semantic similarity is calculated according to below equation:
Wherein counter (TM, s, Keys) represent set TM, sWith KeysThe co-occurrence of middle term The frequency.
4th step, scene semantic state of the corresponding scene label of maximum relation degree summary as document activation theme s is chosen, Form " text-activation theme-scene label " triple.
4. a kind of text based on deep semantic discrimination as claimed in claim 2 recommends method, it is characterised in that in step 4 Structure user version interest topic portrait comprises the following steps that:
The first step, count the theme frequency in database in all text subject trees under situation of presence pattern;
Second step, calculate the theme frequency TF and inverse document frequency IDF of every document, wherein TF=CM/RNRepresent that active user is emerging Under interesting contextual model in every document activate theme the frequency and current document in activate the total word frequency of theme ratio;IDF=log (S/N) it is that always number of files activates the number of files ratio of theme with including under current user interest scene state in database, then takes Result after logarithm value;
3rd step, calculate the interest topic semantic weight C of fusion user feeling Semantic DiscriminationW, i, it is calculated as follows:
CW, i=TFi*IDFi(i=1,2 ..., n),
The theme semantic weight of every document is mapped in grid subject attribute unit group, structure user version interest topic is drawn Picture.
5. a kind of text based on deep semantic discrimination as claimed in claim 2 recommends method, it is characterised in that in step 6 User inquires about comprising the following steps that for interest portrait:
The first step, feedback document and corresponding document subject matter tree are obtained according to pseudo-linear filter principle;
Second step, topic distillation is carried out to original text shelves subject tree according to the scene state that user currently sets, screened out and user The incoherent theme of situation of presence state, leave user's subject tree interested;Each theme goes out in counting user interest topic tree The existing frequency simultaneously does normalized as the initial interest activation value of user, and activation value is mapped to the attribute of grid theme node In label;
3rd step, according to the relationship type between each theme node in grid model, initial activation interest topic is carried out semantic Activation diffusion, excavates the potential interest topic node under user's situation of presence state, and calculate its global activation value;
Grid diffusion formula is:
<mrow> <msub> <mi>O</mi> <mi>j</mi> </msub> <mrow> <mo>(</mo> <mi>t</mi> <mo>+</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>=</mo> <munder> <mo>&amp;Sigma;</mo> <mrow> <mi>i</mi> <mo>&amp;Element;</mo> <msub> <mi>&amp;theta;</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> </mrow> </munder> <msub> <mi>i</mi> <mi>i</mi> </msub> <mrow> <mo>(</mo> <mi>t</mi> <mo>)</mo> </mrow> <mo>*</mo> <msub> <mi>w</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> <mo>*</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>-</mo> <mi>&amp;alpha;</mi> <mo>)</mo> </mrow> <mo>,</mo> </mrow>
Wherein, θijBe the theme in grid model it is all using theme node j as purpose node and with the theme node related to node j I be source node theme set of paths, Ii(t) it is the activation property value of each potential theme node in t grid model, Oj(t+ 1) it is the activation property value of global theme node in t+1 moment grid models, wijWorking as cause with potential theme for activation theme Association's relating value under scape state, α are decay factor, are arranged to 0.75, association's path length is arranged to 3.
4th step, theme overall situation activation value is mapped in grid subject attribute unit group, structure user inquires about interest topic and drawn Picture.
6. a kind of text based on deep semantic discrimination as claimed in claim 1 recommends method, it is characterised in that the step The cosine similarity computational methods based on grid are employed in 7, methods described calculates user version using cosine similarity formula Interest grid is drawn a portrait and language " domain " degree of correlation of interest grid portrait is inquired about with user, and formula represents as follows:
<mrow> <mi>S</mi> <mi>i</mi> <mi>m</mi> <mrow> <mo>(</mo> <msub> <mi>d</mi> <mrow> <mi>j</mi> <mo>,</mo> <msub> <mi>D</mi> <mi>m</mi> </msub> </mrow> </msub> <mo>,</mo> <mi>q</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <mo>|</mo> <msub> <mi>d</mi> <mrow> <mi>j</mi> <mo>,</mo> <msub> <mi>D</mi> <mi>m</mi> </msub> </mrow> </msub> <mo>|</mo> <mo>&amp;CenterDot;</mo> <mo>|</mo> <mi>q</mi> <mo>|</mo> </mrow> <mrow> <mo>|</mo> <msub> <mi>d</mi> <mrow> <mi>j</mi> <mo>,</mo> <msub> <mi>D</mi> <mi>m</mi> </msub> </mrow> </msub> <mo>|</mo> <mo>&amp;times;</mo> <mo>|</mo> <mi>q</mi> <mo>|</mo> </mrow> </mfrac> <mo>.</mo> </mrow>
Wherein,Drawn a portrait for user version interest grid, q={ o1, o2..., onInquired about for user Interest grid is drawn a portrait.
CN201710000406.3A 2017-01-03 2017-01-03 Text recommendation method based on deep semantic analysis Active CN107832312B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710000406.3A CN107832312B (en) 2017-01-03 2017-01-03 Text recommendation method based on deep semantic analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710000406.3A CN107832312B (en) 2017-01-03 2017-01-03 Text recommendation method based on deep semantic analysis

Publications (2)

Publication Number Publication Date
CN107832312A true CN107832312A (en) 2018-03-23
CN107832312B CN107832312B (en) 2023-10-10

Family

ID=61643740

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710000406.3A Active CN107832312B (en) 2017-01-03 2017-01-03 Text recommendation method based on deep semantic analysis

Country Status (1)

Country Link
CN (1) CN107832312B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108595602A (en) * 2018-04-20 2018-09-28 昆明理工大学 The question sentence file classification method combined with depth model based on shallow Model
CN110188189A (en) * 2019-05-21 2019-08-30 浙江工商大学 A kind of method that Knowledge based engineering adaptive event index cognitive model extracts documentation summary
CN111858901A (en) * 2019-04-30 2020-10-30 北京智慧星光信息技术有限公司 Text recommendation method and system based on semantic similarity
CN112256834A (en) * 2020-10-28 2021-01-22 中国科学院声学研究所 Marine science data recommendation system based on content and literature
CN112287218A (en) * 2020-10-26 2021-01-29 安徽工业大学 Knowledge graph-based non-coal mine literature association recommendation method
CN113658714A (en) * 2021-05-11 2021-11-16 武汉大学 Port health quarantine case scene matching method and system for overseas infectious disease input

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080270384A1 (en) * 2007-04-28 2008-10-30 Raymond Lee Shu Tak System and method for intelligent ontology based knowledge search engine
CN103678277A (en) * 2013-12-04 2014-03-26 东软集团股份有限公司 Theme-vocabulary distribution establishing method and system based on document segmenting
CN103942285A (en) * 2014-04-09 2014-07-23 北京搜狗科技发展有限公司 Recommendation method and system for dynamic page element
CN104090958A (en) * 2014-07-04 2014-10-08 许昌学院 Semantic information retrieval system and method based on domain ontology
CN104298732A (en) * 2014-09-29 2015-01-21 中国科学院计算技术研究所 Personalized text sequencing and recommending method for network users
CN104484431A (en) * 2014-12-19 2015-04-01 合肥工业大学 Multi-source individualized news webpage recommending method based on field body
US20150310096A1 (en) * 2014-04-29 2015-10-29 International Business Machines Corporation Comparing document contents using a constructed topic model

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080270384A1 (en) * 2007-04-28 2008-10-30 Raymond Lee Shu Tak System and method for intelligent ontology based knowledge search engine
CN103678277A (en) * 2013-12-04 2014-03-26 东软集团股份有限公司 Theme-vocabulary distribution establishing method and system based on document segmenting
CN103942285A (en) * 2014-04-09 2014-07-23 北京搜狗科技发展有限公司 Recommendation method and system for dynamic page element
US20150310096A1 (en) * 2014-04-29 2015-10-29 International Business Machines Corporation Comparing document contents using a constructed topic model
CN104090958A (en) * 2014-07-04 2014-10-08 许昌学院 Semantic information retrieval system and method based on domain ontology
CN104298732A (en) * 2014-09-29 2015-01-21 中国科学院计算技术研究所 Personalized text sequencing and recommending method for network users
CN104484431A (en) * 2014-12-19 2015-04-01 合肥工业大学 Multi-source individualized news webpage recommending method based on field body

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
ANA O. ALVES 等: "ASAP-II: From the Alignment of Phrases to Text Similarity" *
GANGGAO ZHU 等: "Computing Semantic Similarity of Concepts in Knowledge Graphs" *
张静娴 等: "基于属性结构的本体映射方法" *
李兰彬: "面向专题情报服务的领域知识库构建平台研究" *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108595602A (en) * 2018-04-20 2018-09-28 昆明理工大学 The question sentence file classification method combined with depth model based on shallow Model
CN111858901A (en) * 2019-04-30 2020-10-30 北京智慧星光信息技术有限公司 Text recommendation method and system based on semantic similarity
CN110188189A (en) * 2019-05-21 2019-08-30 浙江工商大学 A kind of method that Knowledge based engineering adaptive event index cognitive model extracts documentation summary
CN110188189B (en) * 2019-05-21 2021-10-08 浙江工商大学 Knowledge-based method for extracting document abstract by adaptive event index cognitive model
CN112287218A (en) * 2020-10-26 2021-01-29 安徽工业大学 Knowledge graph-based non-coal mine literature association recommendation method
CN112256834A (en) * 2020-10-28 2021-01-22 中国科学院声学研究所 Marine science data recommendation system based on content and literature
CN112256834B (en) * 2020-10-28 2021-06-08 中国科学院声学研究所 Marine science data recommendation system based on content and literature
CN113658714A (en) * 2021-05-11 2021-11-16 武汉大学 Port health quarantine case scene matching method and system for overseas infectious disease input
CN113658714B (en) * 2021-05-11 2023-08-18 武汉大学 Port health quarantine case scenario matching method and system for inputting foreign infectious diseases

Also Published As

Publication number Publication date
CN107832312B (en) 2023-10-10

Similar Documents

Publication Publication Date Title
CN107832312A (en) A kind of text based on deep semantic discrimination recommends method
CN106250412B (en) Knowledge mapping construction method based on the fusion of multi-source entity
Yu et al. Hierarchical topic modeling of Twitter data for online analytical processing
US11474979B2 (en) Methods and devices for customizing knowledge representation systems
CN103646032B (en) A kind of based on body with the data base query method of limited natural language processing
Habernal et al. SWSNL: semantic web search using natural language
WO2015093541A1 (en) Scenario generation device and computer program therefor
CN104484431B (en) A kind of multi-source Personalize News webpage recommending method based on domain body
CN109543034B (en) Text clustering method and device based on knowledge graph and readable storage medium
JP5504097B2 (en) Binary relation classification program, method and apparatus for classifying semantically similar word pairs into binary relation
Yang et al. The evolution of interindustry technology linkage topics and its analysis framework in three-dimensional printing technology
Sahri et al. Malaysia indigenous herbs knowledge representation
CN101770473A (en) Method for querying hierarchical semantic venation document
US11809388B2 (en) Methods and devices for customizing knowledge representation systems
Castelltort et al. Exploiting NoSQL graph databases and in memory architectures for extracting graph structural data summaries
CN109101550B (en) Semantic web management system, method, device and storage medium
Li et al. Text similarity computation model for identifying rumor based on bayesian network in microblog.
Ayyasamy et al. Mining Wikipedia knowledge to improve document indexing and classification
Chakradeo et al. Data mining: Building social network
CN113362034A (en) Position recommendation method
Mianowska et al. Using knowledge integration techniques for user profile adaptation method in document retrieval systems
Sahri et al. The design and implementation of Malaysian indigenous herbs knowledge management system based on ontology model
Pelegrina et al. Contextualization and personalization of queries to knowledge bases using spreading activation
Strobin et al. Integration of Multiple Graph Datasets and Their Linguistic Summaries: An Application to Linked Data
Meng A Topological Approach to Compare Document Semantics Based on a New Variant of Syntactic N-grams

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant