CN110362680B - Soft-wide detection and advertisement extraction method based on graph network structure analysis - Google Patents
Soft-wide detection and advertisement extraction method based on graph network structure analysis Download PDFInfo
- Publication number
- CN110362680B CN110362680B CN201910515297.8A CN201910515297A CN110362680B CN 110362680 B CN110362680 B CN 110362680B CN 201910515297 A CN201910515297 A CN 201910515297A CN 110362680 B CN110362680 B CN 110362680B
- Authority
- CN
- China
- Prior art keywords
- network structure
- word
- graph network
- article
- community
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/353—Clustering; Classification into predefined classes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/355—Class or cluster creation or modification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
Abstract
The invention discloses a soft wide detection and advertisement extraction method based on graph network structure analysis, which converts an article into a sentence graph network structure and a word graph network structure, and constructs two graph network structures by respectively taking sentences and words as nodes; then extracting some network structure attribute features from the two graph network structures, combining the network structure attribute features with TF-IDF feature vectors of each article, and training an SVM classifier to identify and detect the soft-wide articles according to labels marked in a corpus; when the soft advertisement content is extracted, the semantic importance degree of the words and the influence of the word nodes in the word graph network structure are considered, the soft advertisement content is accurately extracted, and the social network platform is effectively helped to supervise and manage soft advertisements.
Description
Technical Field
The invention belongs to the field of text classification and advertisement content extraction, and particularly relates to a soft wide detection and advertisement extraction method based on graph network structure analysis.
Background
With the rapid development of internet technology, various social network platforms greatly change and enrich our lives. In 8 months 2012, the WeChat pushes out a WeChat public number platform subscription function, and a user can send articles, pictures and other information to subscribers of the public number. With the great influence of the WeChat public number platform, thousands of fans are harvested by some public numbers, and intimate relations are established between the fans and audience fans. In recent years, many enterprises and merchants have focused on this point, and have developed a new marketing method, i.e., soft text advertising (soft broadcloth), by using the wechat public platform for branding. The soft and wide advertisement content is usually embedded into the article content in a hidden and circuitous way and directly reaches the user, and the readability is strong. However, there are also soft and broad sources of false information that mislead consumers and even harm their economic benefits. Moreover, the softness and breadth can also influence the reading experience of readers and further damage the ecosystem of the platform. Therefore, accurate detection of the soft advertisement and extraction of the advertisement content are very necessary for supervision and management of the platform, and the consumption rights and interests of readers can be effectively protected.
Guo method (refer to Guo's method: Y.Guo and M.Iwaihara, Detection of text-based advertising and promotion in wikipedia by deep learning method) proposes a plain text-based method to automatically detect advertisements and promotional articles in the Wikipedia. Firstly, each article is converted into a document vector form by adopting an improved deep learning method, and then a supervised SVM classifier is trained on the document vector to predict. However, the method only considers the pure text information and does not consider the characteristics of the advertisement and the promotion article, which leaves a great space for improving the detection accuracy.
The Bhosale method (see Bhosale's method: S.Bhosale, H.Vinicombe, and R.Mooney, "Detecting promotional content in wikipedia," in Proceedings of the 2013Conference on electronic Methods in Natural Language Processing,2013, pp.1851-1857.) proposes a method of Detecting promotional articles in the Wikipedia that improves the accuracy in identifying promotional articles by combining the n-gram features with the features of the PCFG Language model. Although this method considers two features, it still does not consider the features in terms of the relevance of the subject.
The Zhang method (see Zhang's method: x.zhang, s.zhu, and w.liang, "Detecting spam and Detecting campaigns in the wireless network," in 2012IEEE 12th international conference on data mining. IEEE,2012, pp.1194-1199.) proposes an extensible framework to detect marketing activities and spam. Linking accounts publishing similar marketing and spam URLs first, then extracting candidate advertisement families for spam or promotional purposes that may exist, and finally differentiating their intent. The method measures the similarity between URLs issued by the account according to the characteristics of the URLs, and distinguishes the characteristics of various marketing activities based on a machine learning method. However, the method focuses more on the characteristics of the URL, and is not suitable for the identification and detection of the soft broadcast issued on the wechat public number platform.
Disclosure of Invention
In order to overcome the problems in the prior art, the invention aims to provide a soft wide detection and advertisement extraction method based on graph network structure analysis.
In order to achieve the purpose, the technical scheme of the invention is as follows:
a soft wide detection and advertisement extraction method based on graph network structure analysis comprises the following steps:
step 1: calculating TF-IDF characteristic vectors;
firstly, segmenting words of each article in a corpus, then calculating TF-IDF values of each word, and obtaining TF-IDF characteristic vectors of each article according to the TF-IDF values of each word;
step 2: constructing a sentence graph network structure;
and step 3: constructing a word graph network structure;
and 4, step 4: extracting the attribute characteristics of the graph network structure to train and detect;
carrying out community detection and division on a sentence graph network structure and a word graph network structure, then extracting graph network structure attribute characteristics of each article, and combining the obtained graph network structure attribute characteristics of each article with TF-IDF characteristic vectors to obtain combined characteristic vectors; and (3) taking the soft broad articles and normal articles marked in the corpus as data sets, training a supervised SVM classifier by using the combined feature vector, and predicting the articles with unknown classification by using the classifier.
The further improvement of the invention is that in the step 1, the TF-IDF characteristic value of each word is calculated by adopting the following formula:
wherein tfidf (w) is the TF-IDF value of the word w, # (w, A) represents the word frequency of the word w in the article A, NwIs the number of words in article a, | D | is the number of articles in the corpus, and I (w, a) is the indicator function.
A further improvement of the invention is that the function value of the indicator function I (w, a) is 1 when the word w is in article a, and 0 otherwise.
The invention is further improved in that the specific process of the step 2 is as follows: firstly, sentence segmentation is carried out on each article, sentences are used as nodes, and semantic similarity values between sentence pairs are used as weights of edges to construct a sentence graph network structure of the article;
wherein, # (w)t,si) Meaning word wtIn sentence siThe word frequency, i and j in (1) respectively represent the appearance sequence of sentences in the article; when node siAnd node sjWeight betweenAnd when the value is larger than or equal to the threshold value alpha, constructing an edge so as to construct a sentence graph network structure of an article.
A further development of the invention is that the threshold α is 0.1.
The invention is further improved in that the specific process of step 3 is as follows: the method adopts words as nodes and the co-occurrence frequency between the words as the weight of edgesConstructing a word graph network structure of an article;
wherein the content of the first and second substances,meaning word wiWord frequency, I ((w) in article A)i,wj),st) To indicate the function, when the word wiAnd the word wjCo-occurrence in sentence stIn middle time, the fingerDenotes the function I ((w)i,wj),st) The function value of (1), otherwise 0; if the weight is not 0, an edge is constructed, thereby constructing a vocabulary network structure.
The further improvement of the invention is that the community attribute in the graph network structure attribute feature of each article comprises a community association degree feature, and the community association degree feature is calculated by adopting the following formula:
where the indices i and j represent different communities, respectively, nijNumber of edges, n, representing a connection between community i and community jiNumber of nodes, n, representing community ijThe number of nodes representing community j.
The invention further improves the method, and also comprises the following step 5: and extracting the advertisement content of the soft advertisements.
The invention has the further improvement that the specific process of the step 5 is as follows: the method comprises the steps of firstly calculating the importance scores of words in each community in a word graph network structure, then carrying out reverse order ordering on the importance of word sets in each community in the word graph network structure, and extracting advertisement contents in articles.
A further improvement of the present invention is that the importance scores for the terms in each community in the term graph network structure are calculated using the following formula:
wherein, Score (w)i) For the importance score of the term in each community in the term graph network structure, tfidf (w)i) The representative word wiThe TF-IDF value of (a),the representative word wiDegree of the node represented, NcMeaning comprising the word wiThe number of nodes of the community。
Compared with the prior art, the invention has the following beneficial effects: according to the invention, two text network structures including a sentence graph network structure and a word graph network structure are constructed for one article, the attribute characteristics of some graph network structures are extracted, and training and classification are carried out by combining the traditional TF-IDF characteristic vector, so that more accurate detection effect is obtained; the invention adopts the characteristic (Community-Connectivity) of measuring the topic association degree between communities, can obviously compare the difference between the soft broad articles and the normal articles in the topic association degree of the communities, and improves the accuracy.
Furthermore, the method for measuring the importance of the word nodes in the community can accurately extract the advertisement content in the soft and medium articles and help the social platform to better manage and monitor the soft and medium articles.
Furthermore, when the importance of the word nodes in the community is measured, the semantic importance degree of the words and the influence of the nodes in the community are considered, so that the importance scores of the words in each community in the word graph network structure are calculated, and then the advertisement content in the article is extracted after the ranking is carried out according to the scores.
Drawings
FIG. 1 is a comparison of the network structure property-Community-Connectivity property of the Sentence graph of a soft Wide article and a Normal article.
FIG. 2 is a comparison of the WordGraph graph network structure property characteristics Community-Connectivity of a soft broad article and a normal article.
FIG. 3 is a comparison of the distribution of the network structure property-Community-Connectivity property of the SentenGraph graph of 200 soft Wide articles and 200 normal articles.
FIG. 4 is a distribution comparison of the WordGraph graph network structure property characteristics Community-Connectivity of 200 soft broad articles and 200 normal articles.
Fig. 5 is a schematic diagram of an advertisement content extraction result of a soft broad article, and a green node in a box is the extracted advertisement content.
Detailed Description
The present invention will be described in detail below with reference to the accompanying drawings.
The invention provides a soft wide detection and advertisement extraction method based on graph network structure analysis, which comprises the following steps:
in the invention, TF-IDF is the word frequency-reverse document frequency.
Step 1: calculating TF-IDF feature vectors: firstly, segmenting words of each article in a corpus, then calculating TF-IDF values of each word, and obtaining TF-IDF characteristic vectors of each article according to the TF-IDF values of each word;
the TF-IDF eigenvalue formula is calculated as follows:
wherein tfidf (w) is the TF-IDF value of the word w, # (w, A) represents the word frequency of the word w in the article A, NwThe number of words in article a, | D | is the number of articles in the corpus, and I (w, a) is an indicator function, which is 1 when word w is in article a, and 0 otherwise. After the TF-IDF value of each word is calculated, the TF-IDF characteristic vector of an article is obtained.
Step 2: construct the sentencyclopph (sentence graph network structure): firstly, segmenting sentences of each article, and constructing a sentence graph network structure of one article by taking the sentences as nodes and taking semantic similarity values between sentence pairs as the weight of edges after completing word segmentation of each article in the corpus based on the step 1; the specific process is as follows:
construction of weights for edges in the Sentence graph (sentence graph network Structure)The calculation formula is as follows:
wherein, # (w)t,si) Meaning word wtIn sentence siThe word frequencies i and j in (1) respectively represent the appearance sequence of sentences in the article.
When node siAnd node sjWeight betweenAnd when the value is larger than or equal to the threshold value alpha, constructing an edge so as to construct a sentence graph network structure of an article. Here, α is set to 0.1.
And step 3: building WordGraph (word graph network structure): after sentence segmentation is completed for each article based on the step 2 and word segmentation is completed for each article in the corpus based on the step 1, a word graph network structure of one article is constructed by taking words as nodes and taking the co-occurrence frequency between the words as the weight of edges;
building weights of edges in WordGraph (word graph network Structure)The calculation formula is as follows:
wherein the content of the first and second substances,meaning word wiWord frequency, I ((w) in article A)i,wj),st) To indicate the function, when the word wiAnd the word wjCo-occurrence in sentence stIn (1), the indicator function I ((w)i,wj),st) The function value of (1) is otherwise 0. If the weight is 0, no edge is constructed. If the weight is not 0, an edge is constructed, thereby constructing WordGraph (word graph network structure).
And 4, step 4: extracting the attribute characteristics of the graph network structure for training and detecting: based on the step 2 and the step 3, community detection and division are firstly carried out on the sentence graph network structure and the word graph network structure, then graph network structure attribute characteristics of each article are extracted, see table 1, and the extracted graph network structure attribute characteristics of each article are combined with the TF-IDF feature vector in the step 1 to obtain a combined feature vector. Taking a soft broad article and a normal article marked in a corpus as a data set, training a supervised SVM classifier by using a feature vector formed by combining graph network structure attribute features and TF-IDF feature vectors, and predicting the articles with unknown classification by using the classifier;
table 1 shows the graph network structure attribute features extracted by the present invention.
TABLE 1 brief description of network architecture Properties
The method for calculating the Community association-Connectivity characteristic in table 1 is as follows:
where the indices i and j represent different communities, respectively, nijNumber of edges, n, representing a connection between community i and community jiNumber of nodes, n, representing community ijThe number of nodes representing community j.
FIG. 1 is a comparison of the network structure property-Community-Connectivity property of the Sentence graph of a soft Wide article and a Normal article. FIG. 2 is a comparison of the WordGraph graph network structure property characteristics Community-Connectivity of a soft broad article and a normal article. FIG. 3 is a comparison of the distribution of the network structure property-Community-Connectivity property of the SentenGraph graph of 200 soft Wide articles and 200 normal articles. FIG. 4 is a distribution comparison of the WordGraph graph network structure property characteristics Community-Connectivity of 200 soft broad articles and 200 normal articles. It can be seen from fig. 1 and 2 that the median of the Community-Connectivity of the normal articles is significantly greater than that of the soft broad articles, and it can be seen from fig. 3 and 4 that the majority of the Community-Connectivity of the normal articles have a greater value than that of the soft broad articles, indicating that the topic relationship between the communities of the normal articles is much closer than that of the soft broad articles.
And 5: and (3) extracting advertisement content of the soft advertisements: calculating the importance score of the words in each community in the word graph network structure, then performing importance reverse ordering on the word set in each community in the word graph network structure, and extracting the advertisement content in the article;
when extracting advertisement content, not only the semantic importance degree of a word, but also the influence of a node in a community is considered, specifically, the calculation formula of the importance score of the word in each community in the word graph network structure is as follows:
wherein, Score (w)i) An importance score is assigned to the terms in each community in the term graph network structure. tfidf (w)i) The representative word wiThe TF-IDF value of (a),the representative word wiDegree of the node represented, NcMeaning comprising the word wiThe number of nodes of the community. Fig. 5 is a schematic diagram of an advertisement content extraction result of a soft broad article, and nodes in a block are extracted advertisement content.
The invention converts an article into a sentence graph network structure and a word graph network structure, and respectively takes sentences and words as nodes to construct two graph network structures; then extracting some network structure attribute features from the two graph network structures, combining the network structure attribute features with TF-IDF feature vectors of each article, and training an SVM classifier to identify and detect the soft-wide articles according to labels marked in a corpus; when the soft advertisement content is extracted, the semantic importance degree of the words and the influence of the word nodes in the word graph network structure are considered, the soft advertisement content is accurately extracted, and the social network platform is effectively helped to supervise and manage soft advertisements.
According to the invention, two text network structures including a sentence graph network structure and a word graph network structure are constructed for one article, the attribute characteristics of some graph network structures are extracted, and training and classification are carried out by combining the traditional TF-IDF characteristic vector, so that more accurate detection effect is obtained; the invention defines a characteristic (Community-Connectivity) for measuring the topic association degree between communities, and can obviously compare the difference between the soft broad articles and the normal articles in the topic association degree of the communities; the invention provides a method for measuring the importance of word nodes in a community, which can accurately extract advertisement contents in soft and medium articles by combining the semantic importance degree of words and the influence of the nodes in a graph network structure and help a social platform to better manage and supervise the soft and medium articles.
Claims (10)
1. A soft wide detection and advertisement extraction method based on graph network structure analysis is characterized by comprising the following steps:
step 1: calculating TF-IDF characteristic vectors;
firstly, segmenting words of each article in a corpus, then calculating TF-IDF values of each word, and obtaining TF-IDF characteristic vectors of each article according to the TF-IDF values of each word;
step 2: constructing a sentence graph network structure;
and step 3: constructing a word graph network structure;
and 4, step 4: extracting the attribute characteristics of the graph network structure to train and detect;
carrying out community detection and division on a sentence graph network structure and a word graph network structure, then extracting graph network structure attribute characteristics of each article, and combining the obtained graph network structure attribute characteristics of each article with TF-IDF characteristic vectors to obtain combined characteristic vectors; and (3) taking the soft broad articles and normal articles marked in the corpus as data sets, training a supervised SVM classifier by using the combined feature vector, and predicting the articles with unknown classification by using the classifier.
2. The method for soft wide detection and advertisement extraction based on graph network structure analysis as claimed in claim 1, wherein in step 1, the following formula is adopted for calculating the TF-IDF characteristic value of each word:
wherein tfidf (w) is the TF-IDF value of the word w, # (w, A) represents the word frequency of the word w in the article A, NwIs the number of words in article a, | D | is the number of articles in the corpus, and I (w, a) is the indicator function.
3. The method of claim 2, wherein when the word w is in article a, the function value of the indication function I (w, a) is 1, otherwise it is 0.
4. The method for soft wide detection and advertisement extraction based on graph network structure analysis according to claim 1, wherein the specific process of step 2 is as follows: firstly, sentence segmentation is carried out on each article, sentences are used as nodes, and semantic similarity values between sentence pairs are used as weights of edges to construct a sentence graph network structure of the article;
wherein, # (w)t,si) Meaning word wtIn sentence siThe word frequency, i and j in (1) respectively represent the appearance sequence of sentences in the article; when node siAnd node sjWeight betweenAnd when the value is larger than or equal to the threshold value alpha, constructing an edge so as to construct a sentence graph network structure of an article.
5. The method of claim 4, wherein the threshold α is 0.1.
6. The method for soft wide detection and advertisement extraction based on graph network structure analysis according to claim 1, wherein the specific process of step 3 is as follows: the method adopts words as nodes and the co-occurrence frequency between the words as the weight of edgesConstructing a word graph network structure of an article;
wherein the content of the first and second substances,meaning word wiWord frequency, I ((w) in article A)i,wj),st) To indicate the function, when the word wiAnd the word wjCo-occurrence in sentence stIn (1), the indicator function I ((w)i,wj),st) The function value of (1), otherwise 0; if the weight is not 0, an edge is constructed, thereby constructing a vocabulary network structure.
7. The method for soft wide detection and advertisement extraction based on graph network structure analysis as claimed in claim 1, wherein the community attributes in the graph network structure attribute features of each article include a community association feature, and the community association feature is calculated by using the following formula:
where the indices i and j represent different communities, respectively, nijNumber of edges, n, representing a connection between community i and community jiNumber of nodes, n, representing community ijThe number of nodes representing community j.
8. The method for soft wide detection and advertisement extraction based on graph network structure analysis according to claim 1, further comprising step 5: and extracting the advertisement content of the soft advertisements.
9. The method for soft wide detection and advertisement extraction based on graph network structure analysis according to claim 8, wherein the specific process of step 5 is as follows: the method comprises the steps of firstly calculating the importance scores of words in each community in a word graph network structure, then carrying out reverse order ordering on the importance of word sets in each community in the word graph network structure, and extracting advertisement contents in articles.
10. The method of claim 9, wherein the importance score of the words in each community in the word graph network structure is calculated by the following formula:
wherein, Score (w)i) For the importance score of the term in each community in the term graph network structure, tfidf (w)i) The representative word wiThe TF-IDF value of (a),the representative word wiDegree of the node represented, NcMeaning comprising the word wiThe number of nodes of the community.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910515297.8A CN110362680B (en) | 2019-06-14 | 2019-06-14 | Soft-wide detection and advertisement extraction method based on graph network structure analysis |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910515297.8A CN110362680B (en) | 2019-06-14 | 2019-06-14 | Soft-wide detection and advertisement extraction method based on graph network structure analysis |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110362680A CN110362680A (en) | 2019-10-22 |
CN110362680B true CN110362680B (en) | 2021-07-13 |
Family
ID=68217515
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910515297.8A Active CN110362680B (en) | 2019-06-14 | 2019-06-14 | Soft-wide detection and advertisement extraction method based on graph network structure analysis |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110362680B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110941962B (en) * | 2019-11-26 | 2021-09-28 | 中国科学院自动化研究所 | Answer sentence selection method and device based on graph network |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2013049529A1 (en) * | 2011-09-30 | 2013-04-04 | Technicolor Usa Inc | Method and apparatus for unsupervised learning of multi-resolution user profile from text analysis |
CN103092975A (en) * | 2013-01-25 | 2013-05-08 | 武汉大学 | Detection and filter method of network community garbage information based on topic consensus coverage rate |
CN103970801A (en) * | 2013-02-05 | 2014-08-06 | 腾讯科技(深圳)有限公司 | Method and device for recognizing microblog advertisement blog articles |
CN106909669A (en) * | 2017-02-28 | 2017-06-30 | 北京时间股份有限公司 | The detection method and device of a kind of promotion message |
CN107729489A (en) * | 2017-10-17 | 2018-02-23 | 北京京东尚科信息技术有限公司 | Advertisement text recognition methods and device |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9426538B2 (en) * | 2013-11-20 | 2016-08-23 | At&T Intellectual Property I, Lp | Method and apparatus for presenting advertising in content having an emotional context |
-
2019
- 2019-06-14 CN CN201910515297.8A patent/CN110362680B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2013049529A1 (en) * | 2011-09-30 | 2013-04-04 | Technicolor Usa Inc | Method and apparatus for unsupervised learning of multi-resolution user profile from text analysis |
CN103092975A (en) * | 2013-01-25 | 2013-05-08 | 武汉大学 | Detection and filter method of network community garbage information based on topic consensus coverage rate |
CN103970801A (en) * | 2013-02-05 | 2014-08-06 | 腾讯科技(深圳)有限公司 | Method and device for recognizing microblog advertisement blog articles |
CN106909669A (en) * | 2017-02-28 | 2017-06-30 | 北京时间股份有限公司 | The detection method and device of a kind of promotion message |
CN107729489A (en) * | 2017-10-17 | 2018-02-23 | 北京京东尚科信息技术有限公司 | Advertisement text recognition methods and device |
Also Published As
Publication number | Publication date |
---|---|
CN110362680A (en) | 2019-10-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110516067B (en) | Public opinion monitoring method, system and storage medium based on topic detection | |
CN108628833B (en) | Method and device for determining summary of original content and method and device for recommending original content | |
CN110046260B (en) | Knowledge graph-based hidden network topic discovery method and system | |
CN104881458B (en) | A kind of mask method and device of Web page subject | |
CN106547875B (en) | Microblog online emergency detection method based on emotion analysis and label | |
CN106708966A (en) | Similarity calculation-based junk comment detection method | |
US20120029908A1 (en) | Information processing device, related sentence providing method, and program | |
WO2015165408A1 (en) | Method and system for filtering goods evaluation information | |
CN110134792B (en) | Text recognition method and device, electronic equipment and storage medium | |
CN105183717B (en) | A kind of OSN user feeling analysis methods based on random forest and customer relationship | |
CN106354845A (en) | Microblog rumor recognizing method and system based on propagation structures | |
CN107944911B (en) | Recommendation method of recommendation system based on text analysis | |
CN107577665B (en) | Text emotional tendency judging method | |
KR20140111307A (en) | User question processing method and system | |
CN109086375A (en) | A kind of short text subject extraction method based on term vector enhancing | |
CN107133282B (en) | Improved evaluation object identification method based on bidirectional propagation | |
CN110287314B (en) | Long text reliability assessment method and system based on unsupervised clustering | |
CN110489745B (en) | Paper text similarity detection method based on citation network | |
CN112069312B (en) | Text classification method based on entity recognition and electronic device | |
Chauhan et al. | Research on product review analysis and spam review detection | |
WO2016040772A1 (en) | Method and apparatus of matching an object to be displayed | |
CN110019820A (en) | Main suit and present illness history symptom Timing Coincidence Detection method in a kind of case history | |
CN105468780B (en) | The normalization method and device of ProductName entity in a kind of microblogging text | |
CN110362680B (en) | Soft-wide detection and advertisement extraction method based on graph network structure analysis | |
CN108427769B (en) | Character interest tag extraction method based on social network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |