CN107832312B

CN107832312B - Text recommendation method based on deep semantic analysis

Info

Publication number: CN107832312B
Application number: CN201710000406.3A
Authority: CN
Inventors: 郐弘智; 陈建辉; 盛文瑾; 闫健卓
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2017-01-03
Filing date: 2017-01-03
Publication date: 2023-10-10
Anticipated expiration: 2037-01-03
Also published as: CN107832312A

Abstract

The invention discloses a text recommendation method based on deep semantic analysis, which automatically extracts text topics according to a deep semantic grid model, infers scene semantics under different text backgrounds according to a topic scene semantic analysis method, and realizing a text theme tree fusing the scene states, and constructing a user text interest portrait for each document according to the user real-time scene state. Aiming at the real-time fluctuation of the scene state of the user at the query end, scene semantic screening is carried out on the text topic tree, the query content is subjected to query interest topic modeling, and carrying out secondary potential semantic reasoning on the direct interest topics of the user according to the activation diffusion method, calculating the global activation value of the topics, and constructing the user query interest portraits fusing the semantics of the current scene. And scoring the document by a similarity calculation method, and generating a text recommendation list according to the scoring level.

Description

Text recommendation method based on deep semantic analysis

Technical Field

The invention relates to the technical field of recommendation, in particular to a text recommendation method based on deep semantic analysis, and particularly relates to a recommendation method based on deep semantic grid model and text topic scene semantic analysis constructed by brain-like 'layering-diverging' thinking modes.

Background

The recommendation system is proposed in the last 90 th century, and early recommendation systems mainly focused on the form similarity of the search results, but neglected the semantic relevance of the search results and the query, so that the recommendation results are quite noisy. In recent years, with the explosive growth of paperless data, the problem of information retrieval effectiveness has attracted extensive attention of researchers, and various semantic-based information retrieval methods are proposed. In the aspect of personalized semantic recommendation, the method is mainly divided into two types of methods of formal semantics and social semantics.

On the one hand, the social semantic method builds human portraits of users by analyzing information such as user logs, user labels, field popularity, user liveness and the like, so as to achieve the effect of personalized recommendation; on the other hand, a method based on the user similarity and the item similarity achieves a recommendation effect by approximating the score of a target user to a certain item through the scores of a plurality of most similar users to the item, such as a collaborative filtering method. The interest correlation of the search result is improved, but a large amount of user behavior data needs to be analyzed, and obviously, most of user data cannot meet the requirement, and meanwhile, the essence of the method is that the form of interest keywords are matched, so that the semantic analysis and potential interest mining capability is lacked; the latter, while more humanized and having a greater ability to mine documents of potential interest, is due to the complex and varied results of the feedback, which in turn results in the appearance of a large amount of content that is irrelevant to the query. Meanwhile, with the continuous expansion of the data recommendation dimension, the problem of cold start caused by data sparsity, especially when a new user or a series of new field literature materials enter the system, the recommendation effect is reduced because of insufficient information support.

Formal semantic recommendation systems mostly employ ontology-based semantic query techniques. The method abstracts the document information to a concept layer, and the concepts are connected together by utilizing different semantic relations to form a netlike structure similar to a brain thinking mode. The method directly operates the text from the concept layer and is mainly applied to the retrieval of the structured knowledge base, so that the semantic relevance of the result is obviously improved. However, when these methods are used to recommend text, the situation that the concept implies the scene semantics in the text, and the semantics are blurred in the process of mapping the document to the ontology is not considered. Accordingly, the prior art is still in need of improvement and development.

Disclosure of Invention

In view of the shortcomings of the prior art, the invention provides a text recommendation method based on deep semantic analysis, and aims to solve the problem that the semantic relevance of the existing recommendation method needs to be improved.

In order to solve the technical problems, the technical scheme adopted by the invention specifically comprises the following steps:

step (a) 1: constructing a deep semantic grid model based on a brain-like 'layering-diverging' thinking mode;

step 2: a grid topic set of a text is inferred by combining a grid topic-synonym bag model and a word matching technology, then scattered topics are connected by utilizing an association-memory function of the grid model, scene labels of different activated topics under the current text are inferred by utilizing a scene semantic analysis function, and finally, a text topic tree integrating multiple scene semantics and memory connection is constructed;

step 3: pruning is carried out on the text topic tree according to the user interests, namely topics and relations which do not accord with the current scene state of the user are filtered, so that a text topic tree based on scene semantic screening is constructed;

step 4: calculating weight values of topics by using all text topic trees subjected to scene semantic screening in a TF-IDF algorithm statistical database and mapping the weight values into corresponding grid topic nodes, so that a user text interest portrait is constructed for each document;

step 5: extracting a document related to the user query content and a text topic tree subjected to corresponding scene semantic screening according to a pseudo-related feedback method, counting the frequency of topics in the feedback tree, and performing normalization processing to obtain an initial interest topic activation value;

step 6: calculating global dynamic activation values of initial interest grid topics and potential interest grid topics under feedback learning by using an activation diffusion mechanism, assigning calculation results to corresponding topic nodes in a grid model, and constructing a user query interest image fused with current scene semantics;

step 7: and scoring the deep semantic relevance between the user query interest portraits and the user text interest portraits by using a cosine similarity calculation method based on grids, and generating a recommendation list for recommendation.

Furthermore, the deep semantic grid model in the step 1 of the invention is a construction method according to a brain-like 'layering-diverging' thinking mode, and the construction process in the step 1 specifically comprises the following steps:

step 1-1, selecting a classification ontology with multi-domain fusion, carrying out semantic splitting and part-of-speech reduction processing on topics in the ontology by using a natural language processing tool of Stanford university to obtain a core topic set, and connecting the core topics into a divergent grid model according to the memory characteristics of the ontology;

step 1-2, constructing a semantic mapping model of 'grid topic-synonym bag', wherein 'topic' represents a core topic in the hierarchical grid model, and 'word bag' is formed by extracting a synonym term set of the topic in a WordNet dictionary. If the term in the 'topic-word bag' model appears in the text, the topic is activated and the corresponding grid node attribute is set to be 1, so that the text shallow semantic topic mining function is realized;

step 1-3, traversing a 'theme-tag-abstract' triplet in a DBpedia knowledge base, matching a theme in the triplet with terms in a 'theme-word bag' model, extracting tags and abstract data corresponding to the matched theme in the knowledge base, mapping a 'grid theme-DBpedia theme-tag-abstract' layer by layer, and associating with semantic correlation types;

and step 1-4, using a layered-memory grid model as a framework to realize a 'divergent-deep' semantic grid model fused with a 'synonym bag-grid theme-DBpedia theme-tag-abstract'.

Furthermore, in step 2 of the present invention, a topic scenario semantic analysis method based on a DBpedia knowledge base is adopted, and the topic scenario semantic analysis method specifically includes:

the first step, generating a term set of a context after dynamic span windowing of an activated subject s in a document, wherein the term set is Key _s ；

Secondly, generating a term set of abstracts under different scene labels m corresponding to the active subjects s in the DBpedia knowledge base, T _m,s The method comprises the steps of carrying out a first treatment on the surface of the Counting the number of abstract terms under a scene label, N _m ；

Thirdly, calculating the semantic similarity of the theme scenes according to the following formula:

wherein counter (T) _m,s ,Key _s ) Representing a set T _m,s And Key (Key) _s Co-occurrence frequency of the term in (a).

And fourthly, selecting a scene label corresponding to the abstract with the maximum correlation degree as the scene semantic state of the document activation subject s to form a text-activation subject-scene label triplet.

Further, the specific steps of constructing the user text interest theme portrait in the step 4 of the present invention are as follows:

firstly, counting topic frequencies in all text topic trees in a database under a current contextual model;

second, calculate the topic frequency TF and inverse document frequency IDF of each document, where tf=c _M /R _N The ratio of the frequency of the active topics in each document to the total word frequency of the active topics in the current document in the current user interest contextual model is represented; idf=log (S/N) is the ratio of the total number of documents in the database to the number of documents containing the active subject under the current user interest scene state, and the result after the numerical value is taken;

thirdly, calculating interest topic semantic weight C fused with emotion semantic analysis of the user _w,i Calculated as follows:

C _w,i ＝TF _i *IDF _i (i＝1,2,…,n)，

and mapping the topic semantic weight of each document into the grid topic attribute unit group, and constructing the user text interest topic portrait.

Further, the specific steps of inquiring the interest portrait by the user adopted in the step 6 of the invention are as follows:

the method comprises the steps of firstly, obtaining a feedback document and a corresponding document theme tree according to a pseudo-correlation feedback principle;

secondly, subject filtering is carried out on the original document subject tree according to the current set scene state of the user, subjects irrelevant to the current scene state of the user are screened out, and subject trees interesting to the user are left; counting the occurrence frequency of each topic in the user interest topic tree, performing normalization processing to obtain an initial interest activation value of the user, and mapping the activation value into an attribute label of a grid topic node;

thirdly, according to the relation types among the topic nodes in the grid model, carrying out semantic activation diffusion on the initial activation interest topic, mining potential interest topic nodes in the current scene state of the user, and calculating the global activation value of the potential interest topic nodes;

the lattice diffusion formula is:

wherein θ _ij For all topic path sets taking topic node j as a destination node and topic node I related to the topic node j as a source node in the topic grid model, I _i (t) is the activation attribute value of each potential topic node in the grid model at the moment of t, O _j (t+1) is the activation attribute value, w, of the global subject node in the grid model at the moment t+1 _ij To activate the association value of a topic with a potential topic in the current context state, α is an attenuation factor, set to 0.75, and the association path length is set to 3.

And fourthly, mapping the global activation value of the theme to the grid theme attribute unit group to construct the theme portrait of the user query interest.

Further, in step 7 of the present invention, the cosine similarity formula is used to calculate the language "domain" relevance between the user text interest grid portrait and the user query interest grid portrait, where the formula is as follows:

wherein, the liquid crystal display device comprises a liquid crystal display device,for a user to text interest in grid portraits, q= { o ₁ ,o ₂ ,…,o _n Query the user for grid portraits of interest.

The invention can be applied to all recommendation systems based on text retrieval, and has the following beneficial effects:

1. in the invention, at the user end, the problems of semantic relevance and potential interest mining of the user query are solved by adopting a primary feedback topic learning method based on scene semantic analysis and a topic expansion method of secondary semantic activation diffusion;

2. according to the method, at the document end, the document theme and the scene semantic characteristics of the theme are automatically inferred according to the deep semantic grid model, so that the functions of automatically extracting the text theme and mining deep interest semantics are realized.

Drawings

FIG. 1 is a flowchart of a text recommendation method based on deep semantic analysis according to a preferred embodiment of the present invention.

Fig. 2 is a specific flowchart of step S100 in the method shown in fig. 1.

Fig. 3 is a specific flowchart of step S102 in the method shown in fig. 1.

Fig. 4 is a specific flowchart of step S103 in the method shown in fig. 1.

Fig. 5 is a specific flowchart of step S104 in the method shown in fig. 1.

FIG. 6 is a comparison of the deep semantic recommendation method and the conventional semantic recommendation method on a system Ranking Score (RS) under different interesting content input conditions of a user.

Detailed Description

The invention provides a text recommendation method based on deep semantic analysis, and the invention is further described in detail below with reference to the accompanying drawings and specific embodiments.

FIG. 1 is a flow chart of a text recommendation method based on deep semantic analysis according to a preferred embodiment of the present invention, as shown, the implementation steps are:

s100, constructing a deep semantic grid model based on a brain-like 'layering-diverging' thinking mode;

s101, inputting interesting content by a user and setting a current scene state;

s102, performing topic reasoning and topic scene semantic analysis on the text, constructing a text topic tree, and performing topic semantic screening on the document topic tree according to the scene state of the current user, so as to construct a text topic tree fused with the scene semantic screening;

s103, counting all text topic trees subjected to scene semantic screening in a database by utilizing a TF-IDF algorithm, calculating the weight value of the topic and mapping the weight value into corresponding grid topic nodes, so as to construct a user text interest portrait for each document;

s104, extracting a document related to the user query content and a text topic tree subjected to corresponding scene semantic screening according to a pseudo-related feedback method, counting the frequency of topics in the feedback tree, and performing normalization processing to obtain an initial interest topic activation value; calculating global dynamic activation values of initial interest grid topics and potential interest grid topics under feedback learning by using an activation diffusion mechanism, assigning calculation results to corresponding topic nodes in a grid model, and constructing a user query interest image fused with current scene semantics;

s105, calculating the semantic similarity of the user text interest portraits and the user query interest portraits in the current contextual model through a cosine similarity algorithm based on grids, and grading;

s106, sorting from large to small according to the relevance scores of the models, generating a recommendation list, and recommending the documents of interest to the user.

Further, as shown in fig. 2, the step S100 specifically includes:

s001, selecting a classification ontology with multi-domain fusion, carrying out semantic splitting and part-of-speech reduction processing on topics in the ontology by using a natural language processing tool of Stanford university to obtain a core topic set, and connecting the core topics into a divergent grid model according to the memory characteristics of the ontology;

s002, constructing a semantic mapping model of a grid theme-synonym bag, wherein the theme represents a core theme in the hierarchical grid model, and the word bag is formed by extracting a synonym term set of the theme in a WordNet dictionary;

s003, traversing a theme-tag-abstract triplet in a DBpedia knowledge base, matching the theme in the triplet with terms in a theme-word bag model, mapping a grid theme and the DBpedia theme, and associating with semantic correlation types;

s004, extracting labels and abstract data corresponding to the matched topics in the DBpedia knowledge base, and taking the layered-memory grid model as a framework to realize a 'divergent-deep' semantic grid model fused with a 'synonym bag-grid topic-DBpedia topic-label-abstract'.

Further, as shown in fig. 3, the step S102 specifically includes:

s201, keyword matching is carried out on text terms by utilizing semantic relevance of a 'topic-word bag' in a grid model, if the terms in the word bag appear in the text, the topic is activated, and the corresponding grid node attribute is set to be 1, so that a text shallow semantic topic mining function is realized;

s202, constructing a text topic tree by scattered topics according to the 'association-memory' characteristic of the deep semantic grid model;

s203, carrying out scene state analysis on the document theme, wherein the specific steps are as follows:

wherein counter (T) _m,s ,Key _s ) Representing a set T _m,s And Key (Key) _s Co-occurrence frequency of the term in (a);

Further, as shown in fig. 4, the step S103 specifically includes:

s301, counting topic frequencies in all text topic trees in a database under a current contextual model;

s302, calculating the topic frequency TF and the inverse document frequency IDF of each document, wherein TF=C _M /R _N The ratio of the frequency of the active topics in each document to the total word frequency of the active topics in the current document in the current user interest contextual model is represented; idf=log (S/N) is the ratio of the total number of documents in the database to the number of documents containing the active subject under the current user interest scene state, and the result after the numerical value is taken;

s303, calculating interest topic semantic weight C fused with emotion semantic analysis of user _w,i Calculated as follows:

C _w,i ＝TF _i *IDF _i (i＝1,2,…,n)，

Further, as shown in fig. 5, the step S104 specifically includes:

s401, acquiring a feedback document and a corresponding document theme tree according to a pseudo-correlation feedback principle;

s402, subject filtering is carried out on the original document subject tree according to the current set scene state of the user, subjects irrelevant to the current scene state of the user are screened out, and subject trees interesting to the user are left; counting the occurrence frequency of each topic in the user interest topic tree, performing normalization processing to obtain an initial interest activation value of the user, and mapping the activation value into an attribute label of a grid topic node;

s403, performing semantic activation diffusion on the initial activation interest topic according to the relation type among topic nodes in the grid model, mining potential interest topic nodes in the current scene state of the user, and calculating a global activation value of the potential interest topic nodes;

the lattice diffusion formula is:

S404, mapping the global activation value of the theme to the grid theme attribute unit group to construct the theme portrait of the user query interest.

Further, according to the step S105, the degree of correlation between the user text interest grid portraits and the user query interest grid portraits in terms of the language "domain" is calculated by using the cosine similarity formula, which is expressed as follows:

wherein the method comprises the steps of the process comprises,for a user text interest grid portrayal, q= { o ₁ ,o ₂ ,…,o _n Query the user for grid portraits of interest.

According to the invention, the scene semantic analysis technology is applied in the process of inquiring the user and learning the document theme to improve the relevance of the recommended document, so that the document is recommended more intelligently, the influence of similar but irrelevant documents on the recommendation result can be effectively reduced, the semantic relevance of a recommendation system is improved, the true direction of personal interests of the user is found, and the accuracy and individuation analysis capability of the recommendation system are improved.

The text recommendation method based on deep semantic analysis is compared with the traditional semantic recommendation method, and experimental parameters are selected as follows: the simulation data set is selected from 2005 document data in PubMed database, which contains 26000 paper abstracts of biomedical aspects. The deep semantic grid model is built by ontology and DBpedia knowledge base in ACM Digital Library full text database. The text processing tool employs a series of open-source Java text analysis tools provided by the stanford university natural language processing group.

The influence of the invention on the sequencing accuracy of the recommendation system is verified, and the experimental result is as follows:

FIG. 6 is a comparison of a conventional semantic recommendation method and a deep semantic recommendation method on a system Ranking Score (RS) under different interesting content input conditions of a user. The traditional semantic recommendation method represents a shallow semantic recommendation method without scene semantic resolution, and the deep semantic recommendation method represents a method provided by the invention; as can be seen from fig. 6, the ranking score of the present invention was always lower than that of the conventional semantic recommendation method in the case of 5 experiments. Because the smaller the ranking score is, the more the system tends to rank the favorite commodity of the user in front, and therefore, the experimental result shows that the method provided by the invention has better recommending effect.

It should be noted that the scope of the present invention includes, but is not limited to, the above examples, and any modifications or variations made in light of the above description should be within the scope of the present invention to those of ordinary skill in the art.

Claims

1. The text recommendation method based on depth semantic analysis is characterized by comprising the following steps of:

step 1: constructing a deep semantic grid model based on a brain-like 'layering-diverging' thinking mode;

step 2: the method comprises the steps of combining a grid topic set of a text inferred by a grid topic-synonym bag model and a word matching technology, connecting scattered topics by utilizing an association-memorization function of the grid model, then inferring scene labels of different activated topics under the current text by utilizing a scene semantic analysis function, and finally constructing a text topic tree integrating multiple scene semantics and memorization connection;

step 4: calculating weight values of topics by using all text topic trees subjected to scene semantic screening in a TF-IDF algorithm statistical database and mapping the weight values into corresponding grid topic nodes to construct user text interest portraits for each document;

2. The text recommendation method based on deep semantic resolution according to claim 1, wherein the deep semantic grid model in step 1 is constructed according to a brain-like "layered-divergent" thinking model, and the construction process specifically includes:

firstly, selecting a classification ontology with multi-domain fusion, carrying out semantic splitting and part-of-speech reduction processing on topics in the ontology by using a natural language processing tool of Stanford university to obtain a core topic set, and connecting the core topics into a divergent grid model according to the memory characteristics of the ontology;

secondly, constructing a semantic mapping model of a grid theme-synonym bag, wherein the theme represents a core theme in the hierarchical grid model, and the word bag is formed by extracting a synonym term set of the theme in a WordNet dictionary; if the term in the 'topic-word bag' model appears in the text, the topic is activated and the corresponding grid node attribute is set to be 1, so that the text shallow semantic topic mining function is realized;

thirdly, traversing a 'theme-tag-abstract' triplet in a DBpedia knowledge base, matching the theme in the triplet with terms in a 'theme-word bag' model, extracting tags and abstract data corresponding to the matched theme in the knowledge base, mapping the 'grid theme-DBpedia theme-tag-abstract' layer by layer, and associating with semantic correlation types;

and fourthly, taking the layered-memory grid model as a framework to realize a 'divergent-deep' semantic grid model fused with a 'synonym bag-grid theme-DBpedia theme-tag-abstract'.