CN110362680B - Soft-wide detection and advertisement extraction method based on graph network structure analysis - Google Patents

Soft-wide detection and advertisement extraction method based on graph network structure analysis Download PDF

Info

Publication number
CN110362680B
CN110362680B CN201910515297.8A CN201910515297A CN110362680B CN 110362680 B CN110362680 B CN 110362680B CN 201910515297 A CN201910515297 A CN 201910515297A CN 110362680 B CN110362680 B CN 110362680B
Authority
CN
China
Prior art keywords
network structure
word
graph network
article
community
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910515297.8A
Other languages
Chinese (zh)
Other versions
CN110362680A (en
Inventor
王晨旭
梁潇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Jiaotong University
Original Assignee
Xian Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Jiaotong University filed Critical Xian Jiaotong University
Priority to CN201910515297.8A priority Critical patent/CN110362680B/en
Publication of CN110362680A publication Critical patent/CN110362680A/en
Application granted granted Critical
Publication of CN110362680B publication Critical patent/CN110362680B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Abstract

The invention discloses a soft wide detection and advertisement extraction method based on graph network structure analysis, which converts an article into a sentence graph network structure and a word graph network structure, and constructs two graph network structures by respectively taking sentences and words as nodes; then extracting some network structure attribute features from the two graph network structures, combining the network structure attribute features with TF-IDF feature vectors of each article, and training an SVM classifier to identify and detect the soft-wide articles according to labels marked in a corpus; when the soft advertisement content is extracted, the semantic importance degree of the words and the influence of the word nodes in the word graph network structure are considered, the soft advertisement content is accurately extracted, and the social network platform is effectively helped to supervise and manage soft advertisements.

Description

Soft-wide detection and advertisement extraction method based on graph network structure analysis
Technical Field
The invention belongs to the field of text classification and advertisement content extraction, and particularly relates to a soft wide detection and advertisement extraction method based on graph network structure analysis.
Background
With the rapid development of internet technology, various social network platforms greatly change and enrich our lives. In 8 months 2012, the WeChat pushes out a WeChat public number platform subscription function, and a user can send articles, pictures and other information to subscribers of the public number. With the great influence of the WeChat public number platform, thousands of fans are harvested by some public numbers, and intimate relations are established between the fans and audience fans. In recent years, many enterprises and merchants have focused on this point, and have developed a new marketing method, i.e., soft text advertising (soft broadcloth), by using the wechat public platform for branding. The soft and wide advertisement content is usually embedded into the article content in a hidden and circuitous way and directly reaches the user, and the readability is strong. However, there are also soft and broad sources of false information that mislead consumers and even harm their economic benefits. Moreover, the softness and breadth can also influence the reading experience of readers and further damage the ecosystem of the platform. Therefore, accurate detection of the soft advertisement and extraction of the advertisement content are very necessary for supervision and management of the platform, and the consumption rights and interests of readers can be effectively protected.
Guo method (refer to Guo's method: Y.Guo and M.Iwaihara, Detection of text-based advertising and promotion in wikipedia by deep learning method) proposes a plain text-based method to automatically detect advertisements and promotional articles in the Wikipedia. Firstly, each article is converted into a document vector form by adopting an improved deep learning method, and then a supervised SVM classifier is trained on the document vector to predict. However, the method only considers the pure text information and does not consider the characteristics of the advertisement and the promotion article, which leaves a great space for improving the detection accuracy.
The Bhosale method (see Bhosale's method: S.Bhosale, H.Vinicombe, and R.Mooney, "Detecting promotional content in wikipedia," in Proceedings of the 2013Conference on electronic Methods in Natural Language Processing,2013, pp.1851-1857.) proposes a method of Detecting promotional articles in the Wikipedia that improves the accuracy in identifying promotional articles by combining the n-gram features with the features of the PCFG Language model. Although this method considers two features, it still does not consider the features in terms of the relevance of the subject.
The Zhang method (see Zhang's method: x.zhang, s.zhu, and w.liang, "Detecting spam and Detecting campaigns in the wireless network," in 2012IEEE 12th international conference on data mining. IEEE,2012, pp.1194-1199.) proposes an extensible framework to detect marketing activities and spam. Linking accounts publishing similar marketing and spam URLs first, then extracting candidate advertisement families for spam or promotional purposes that may exist, and finally differentiating their intent. The method measures the similarity between URLs issued by the account according to the characteristics of the URLs, and distinguishes the characteristics of various marketing activities based on a machine learning method. However, the method focuses more on the characteristics of the URL, and is not suitable for the identification and detection of the soft broadcast issued on the wechat public number platform.
Disclosure of Invention
In order to overcome the problems in the prior art, the invention aims to provide a soft wide detection and advertisement extraction method based on graph network structure analysis.
In order to achieve the purpose, the technical scheme of the invention is as follows:
a soft wide detection and advertisement extraction method based on graph network structure analysis comprises the following steps:
step 1: calculating TF-IDF characteristic vectors;
firstly, segmenting words of each article in a corpus, then calculating TF-IDF values of each word, and obtaining TF-IDF characteristic vectors of each article according to the TF-IDF values of each word;
step 2: constructing a sentence graph network structure;
and step 3: constructing a word graph network structure;
and 4, step 4: extracting the attribute characteristics of the graph network structure to train and detect;
carrying out community detection and division on a sentence graph network structure and a word graph network structure, then extracting graph network structure attribute characteristics of each article, and combining the obtained graph network structure attribute characteristics of each article with TF-IDF characteristic vectors to obtain combined characteristic vectors; and (3) taking the soft broad articles and normal articles marked in the corpus as data sets, training a supervised SVM classifier by using the combined feature vector, and predicting the articles with unknown classification by using the classifier.
The further improvement of the invention is that in the step 1, the TF-IDF characteristic value of each word is calculated by adopting the following formula:
Figure BDA0002094833210000031
wherein tfidf (w) is the TF-IDF value of the word w, # (w, A) represents the word frequency of the word w in the article A, NwIs the number of words in article a, | D | is the number of articles in the corpus, and I (w, a) is the indicator function.
A further improvement of the invention is that the function value of the indicator function I (w, a) is 1 when the word w is in article a, and 0 otherwise.
The invention is further improved in that the specific process of the step 2 is as follows: firstly, sentence segmentation is carried out on each article, sentences are used as nodes, and semantic similarity values between sentence pairs are used as weights of edges to construct a sentence graph network structure of the article;
wherein the weight of the edge
Figure BDA0002094833210000032
Calculated using the following formula:
Figure BDA0002094833210000033
wherein, # (w)t,si) Meaning word wtIn sentence siThe word frequency, i and j in (1) respectively represent the appearance sequence of sentences in the article; when node siAnd node sjWeight between
Figure BDA0002094833210000034
And when the value is larger than or equal to the threshold value alpha, constructing an edge so as to construct a sentence graph network structure of an article.
A further development of the invention is that the threshold α is 0.1.
The invention is further improved in that the specific process of step 3 is as follows: the method adopts words as nodes and the co-occurrence frequency between the words as the weight of edges
Figure BDA0002094833210000035
Constructing a word graph network structure of an article;
wherein the weight of the edge
Figure BDA0002094833210000036
The calculation formula is as follows:
Figure BDA0002094833210000037
wherein the content of the first and second substances,
Figure BDA0002094833210000038
meaning word wiWord frequency, I ((w) in article A)i,wj),st) To indicate the function, when the word wiAnd the word wjCo-occurrence in sentence stIn middle time, the fingerDenotes the function I ((w)i,wj),st) The function value of (1), otherwise 0; if the weight is not 0, an edge is constructed, thereby constructing a vocabulary network structure.
The further improvement of the invention is that the community attribute in the graph network structure attribute feature of each article comprises a community association degree feature, and the community association degree feature is calculated by adopting the following formula:
Figure BDA0002094833210000041
where the indices i and j represent different communities, respectively, nijNumber of edges, n, representing a connection between community i and community jiNumber of nodes, n, representing community ijThe number of nodes representing community j.
The invention further improves the method, and also comprises the following step 5: and extracting the advertisement content of the soft advertisements.
The invention has the further improvement that the specific process of the step 5 is as follows: the method comprises the steps of firstly calculating the importance scores of words in each community in a word graph network structure, then carrying out reverse order ordering on the importance of word sets in each community in the word graph network structure, and extracting advertisement contents in articles.
A further improvement of the present invention is that the importance scores for the terms in each community in the term graph network structure are calculated using the following formula:
Figure BDA0002094833210000042
wherein, Score (w)i) For the importance score of the term in each community in the term graph network structure, tfidf (w)i) The representative word wiThe TF-IDF value of (a),
Figure BDA0002094833210000043
the representative word wiDegree of the node represented, NcMeaning comprising the word wiThe number of nodes of the community。
Compared with the prior art, the invention has the following beneficial effects: according to the invention, two text network structures including a sentence graph network structure and a word graph network structure are constructed for one article, the attribute characteristics of some graph network structures are extracted, and training and classification are carried out by combining the traditional TF-IDF characteristic vector, so that more accurate detection effect is obtained; the invention adopts the characteristic (Community-Connectivity) of measuring the topic association degree between communities, can obviously compare the difference between the soft broad articles and the normal articles in the topic association degree of the communities, and improves the accuracy.
Furthermore, the method for measuring the importance of the word nodes in the community can accurately extract the advertisement content in the soft and medium articles and help the social platform to better manage and monitor the soft and medium articles.
Furthermore, when the importance of the word nodes in the community is measured, the semantic importance degree of the words and the influence of the nodes in the community are considered, so that the importance scores of the words in each community in the word graph network structure are calculated, and then the advertisement content in the article is extracted after the ranking is carried out according to the scores.
Drawings
FIG. 1 is a comparison of the network structure property-Community-Connectivity property of the Sentence graph of a soft Wide article and a Normal article.
FIG. 2 is a comparison of the WordGraph graph network structure property characteristics Community-Connectivity of a soft broad article and a normal article.
FIG. 3 is a comparison of the distribution of the network structure property-Community-Connectivity property of the SentenGraph graph of 200 soft Wide articles and 200 normal articles.
FIG. 4 is a distribution comparison of the WordGraph graph network structure property characteristics Community-Connectivity of 200 soft broad articles and 200 normal articles.
Fig. 5 is a schematic diagram of an advertisement content extraction result of a soft broad article, and a green node in a box is the extracted advertisement content.
Detailed Description
The present invention will be described in detail below with reference to the accompanying drawings.
The invention provides a soft wide detection and advertisement extraction method based on graph network structure analysis, which comprises the following steps:
in the invention, TF-IDF is the word frequency-reverse document frequency.
Step 1: calculating TF-IDF feature vectors: firstly, segmenting words of each article in a corpus, then calculating TF-IDF values of each word, and obtaining TF-IDF characteristic vectors of each article according to the TF-IDF values of each word;
the TF-IDF eigenvalue formula is calculated as follows:
Figure BDA0002094833210000061
wherein tfidf (w) is the TF-IDF value of the word w, # (w, A) represents the word frequency of the word w in the article A, NwThe number of words in article a, | D | is the number of articles in the corpus, and I (w, a) is an indicator function, which is 1 when word w is in article a, and 0 otherwise. After the TF-IDF value of each word is calculated, the TF-IDF characteristic vector of an article is obtained.
Step 2: construct the sentencyclopph (sentence graph network structure): firstly, segmenting sentences of each article, and constructing a sentence graph network structure of one article by taking the sentences as nodes and taking semantic similarity values between sentence pairs as the weight of edges after completing word segmentation of each article in the corpus based on the step 1; the specific process is as follows:
construction of weights for edges in the Sentence graph (sentence graph network Structure)
Figure BDA0002094833210000062
The calculation formula is as follows:
Figure BDA0002094833210000063
wherein, # (w)t,si) Meaning word wtIn sentence siThe word frequencies i and j in (1) respectively represent the appearance sequence of sentences in the article.
When node siAnd node sjWeight between
Figure BDA0002094833210000064
And when the value is larger than or equal to the threshold value alpha, constructing an edge so as to construct a sentence graph network structure of an article. Here, α is set to 0.1.
And step 3: building WordGraph (word graph network structure): after sentence segmentation is completed for each article based on the step 2 and word segmentation is completed for each article in the corpus based on the step 1, a word graph network structure of one article is constructed by taking words as nodes and taking the co-occurrence frequency between the words as the weight of edges;
building weights of edges in WordGraph (word graph network Structure)
Figure BDA0002094833210000065
The calculation formula is as follows:
Figure BDA0002094833210000066
wherein the content of the first and second substances,
Figure BDA0002094833210000067
meaning word wiWord frequency, I ((w) in article A)i,wj),st) To indicate the function, when the word wiAnd the word wjCo-occurrence in sentence stIn (1), the indicator function I ((w)i,wj),st) The function value of (1) is otherwise 0. If the weight is 0, no edge is constructed. If the weight is not 0, an edge is constructed, thereby constructing WordGraph (word graph network structure).
And 4, step 4: extracting the attribute characteristics of the graph network structure for training and detecting: based on the step 2 and the step 3, community detection and division are firstly carried out on the sentence graph network structure and the word graph network structure, then graph network structure attribute characteristics of each article are extracted, see table 1, and the extracted graph network structure attribute characteristics of each article are combined with the TF-IDF feature vector in the step 1 to obtain a combined feature vector. Taking a soft broad article and a normal article marked in a corpus as a data set, training a supervised SVM classifier by using a feature vector formed by combining graph network structure attribute features and TF-IDF feature vectors, and predicting the articles with unknown classification by using the classifier;
table 1 shows the graph network structure attribute features extracted by the present invention.
TABLE 1 brief description of network architecture Properties
Figure BDA0002094833210000071
The method for calculating the Community association-Connectivity characteristic in table 1 is as follows:
Figure BDA0002094833210000072
where the indices i and j represent different communities, respectively, nijNumber of edges, n, representing a connection between community i and community jiNumber of nodes, n, representing community ijThe number of nodes representing community j.
FIG. 1 is a comparison of the network structure property-Community-Connectivity property of the Sentence graph of a soft Wide article and a Normal article. FIG. 2 is a comparison of the WordGraph graph network structure property characteristics Community-Connectivity of a soft broad article and a normal article. FIG. 3 is a comparison of the distribution of the network structure property-Community-Connectivity property of the SentenGraph graph of 200 soft Wide articles and 200 normal articles. FIG. 4 is a distribution comparison of the WordGraph graph network structure property characteristics Community-Connectivity of 200 soft broad articles and 200 normal articles. It can be seen from fig. 1 and 2 that the median of the Community-Connectivity of the normal articles is significantly greater than that of the soft broad articles, and it can be seen from fig. 3 and 4 that the majority of the Community-Connectivity of the normal articles have a greater value than that of the soft broad articles, indicating that the topic relationship between the communities of the normal articles is much closer than that of the soft broad articles.
And 5: and (3) extracting advertisement content of the soft advertisements: calculating the importance score of the words in each community in the word graph network structure, then performing importance reverse ordering on the word set in each community in the word graph network structure, and extracting the advertisement content in the article;
when extracting advertisement content, not only the semantic importance degree of a word, but also the influence of a node in a community is considered, specifically, the calculation formula of the importance score of the word in each community in the word graph network structure is as follows:
Figure BDA0002094833210000081
wherein, Score (w)i) An importance score is assigned to the terms in each community in the term graph network structure. tfidf (w)i) The representative word wiThe TF-IDF value of (a),
Figure BDA0002094833210000082
the representative word wiDegree of the node represented, NcMeaning comprising the word wiThe number of nodes of the community. Fig. 5 is a schematic diagram of an advertisement content extraction result of a soft broad article, and nodes in a block are extracted advertisement content.
The invention converts an article into a sentence graph network structure and a word graph network structure, and respectively takes sentences and words as nodes to construct two graph network structures; then extracting some network structure attribute features from the two graph network structures, combining the network structure attribute features with TF-IDF feature vectors of each article, and training an SVM classifier to identify and detect the soft-wide articles according to labels marked in a corpus; when the soft advertisement content is extracted, the semantic importance degree of the words and the influence of the word nodes in the word graph network structure are considered, the soft advertisement content is accurately extracted, and the social network platform is effectively helped to supervise and manage soft advertisements.
According to the invention, two text network structures including a sentence graph network structure and a word graph network structure are constructed for one article, the attribute characteristics of some graph network structures are extracted, and training and classification are carried out by combining the traditional TF-IDF characteristic vector, so that more accurate detection effect is obtained; the invention defines a characteristic (Community-Connectivity) for measuring the topic association degree between communities, and can obviously compare the difference between the soft broad articles and the normal articles in the topic association degree of the communities; the invention provides a method for measuring the importance of word nodes in a community, which can accurately extract advertisement contents in soft and medium articles by combining the semantic importance degree of words and the influence of the nodes in a graph network structure and help a social platform to better manage and supervise the soft and medium articles.

Claims (10)

1. A soft wide detection and advertisement extraction method based on graph network structure analysis is characterized by comprising the following steps:
step 1: calculating TF-IDF characteristic vectors;
firstly, segmenting words of each article in a corpus, then calculating TF-IDF values of each word, and obtaining TF-IDF characteristic vectors of each article according to the TF-IDF values of each word;
step 2: constructing a sentence graph network structure;
and step 3: constructing a word graph network structure;
and 4, step 4: extracting the attribute characteristics of the graph network structure to train and detect;
carrying out community detection and division on a sentence graph network structure and a word graph network structure, then extracting graph network structure attribute characteristics of each article, and combining the obtained graph network structure attribute characteristics of each article with TF-IDF characteristic vectors to obtain combined characteristic vectors; and (3) taking the soft broad articles and normal articles marked in the corpus as data sets, training a supervised SVM classifier by using the combined feature vector, and predicting the articles with unknown classification by using the classifier.
2. The method for soft wide detection and advertisement extraction based on graph network structure analysis as claimed in claim 1, wherein in step 1, the following formula is adopted for calculating the TF-IDF characteristic value of each word:
Figure FDA0002094833200000011
wherein tfidf (w) is the TF-IDF value of the word w, # (w, A) represents the word frequency of the word w in the article A, NwIs the number of words in article a, | D | is the number of articles in the corpus, and I (w, a) is the indicator function.
3. The method of claim 2, wherein when the word w is in article a, the function value of the indication function I (w, a) is 1, otherwise it is 0.
4. The method for soft wide detection and advertisement extraction based on graph network structure analysis according to claim 1, wherein the specific process of step 2 is as follows: firstly, sentence segmentation is carried out on each article, sentences are used as nodes, and semantic similarity values between sentence pairs are used as weights of edges to construct a sentence graph network structure of the article;
wherein the weight of the edge
Figure FDA0002094833200000012
Calculated using the following formula:
Figure FDA0002094833200000021
wherein, # (w)t,si) Meaning word wtIn sentence siThe word frequency, i and j in (1) respectively represent the appearance sequence of sentences in the article; when node siAnd node sjWeight between
Figure FDA0002094833200000022
And when the value is larger than or equal to the threshold value alpha, constructing an edge so as to construct a sentence graph network structure of an article.
5. The method of claim 4, wherein the threshold α is 0.1.
6. The method for soft wide detection and advertisement extraction based on graph network structure analysis according to claim 1, wherein the specific process of step 3 is as follows: the method adopts words as nodes and the co-occurrence frequency between the words as the weight of edges
Figure FDA0002094833200000023
Constructing a word graph network structure of an article;
wherein the weight of the edge
Figure FDA0002094833200000024
The calculation formula is as follows:
Figure FDA0002094833200000025
wherein the content of the first and second substances,
Figure FDA0002094833200000026
meaning word wiWord frequency, I ((w) in article A)i,wj),st) To indicate the function, when the word wiAnd the word wjCo-occurrence in sentence stIn (1), the indicator function I ((w)i,wj),st) The function value of (1), otherwise 0; if the weight is not 0, an edge is constructed, thereby constructing a vocabulary network structure.
7. The method for soft wide detection and advertisement extraction based on graph network structure analysis as claimed in claim 1, wherein the community attributes in the graph network structure attribute features of each article include a community association feature, and the community association feature is calculated by using the following formula:
Figure FDA0002094833200000027
where the indices i and j represent different communities, respectively, nijNumber of edges, n, representing a connection between community i and community jiNumber of nodes, n, representing community ijThe number of nodes representing community j.
8. The method for soft wide detection and advertisement extraction based on graph network structure analysis according to claim 1, further comprising step 5: and extracting the advertisement content of the soft advertisements.
9. The method for soft wide detection and advertisement extraction based on graph network structure analysis according to claim 8, wherein the specific process of step 5 is as follows: the method comprises the steps of firstly calculating the importance scores of words in each community in a word graph network structure, then carrying out reverse order ordering on the importance of word sets in each community in the word graph network structure, and extracting advertisement contents in articles.
10. The method of claim 9, wherein the importance score of the words in each community in the word graph network structure is calculated by the following formula:
Figure FDA0002094833200000031
wherein, Score (w)i) For the importance score of the term in each community in the term graph network structure, tfidf (w)i) The representative word wiThe TF-IDF value of (a),
Figure FDA0002094833200000032
the representative word wiDegree of the node represented, NcMeaning comprising the word wiThe number of nodes of the community.
CN201910515297.8A 2019-06-14 2019-06-14 Soft-wide detection and advertisement extraction method based on graph network structure analysis Active CN110362680B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910515297.8A CN110362680B (en) 2019-06-14 2019-06-14 Soft-wide detection and advertisement extraction method based on graph network structure analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910515297.8A CN110362680B (en) 2019-06-14 2019-06-14 Soft-wide detection and advertisement extraction method based on graph network structure analysis

Publications (2)

Publication Number Publication Date
CN110362680A CN110362680A (en) 2019-10-22
CN110362680B true CN110362680B (en) 2021-07-13

Family

ID=68217515

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910515297.8A Active CN110362680B (en) 2019-06-14 2019-06-14 Soft-wide detection and advertisement extraction method based on graph network structure analysis

Country Status (1)

Country Link
CN (1) CN110362680B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110941962B (en) * 2019-11-26 2021-09-28 中国科学院自动化研究所 Answer sentence selection method and device based on graph network

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013049529A1 (en) * 2011-09-30 2013-04-04 Technicolor Usa Inc Method and apparatus for unsupervised learning of multi-resolution user profile from text analysis
CN103092975A (en) * 2013-01-25 2013-05-08 武汉大学 Detection and filter method of network community garbage information based on topic consensus coverage rate
CN103970801A (en) * 2013-02-05 2014-08-06 腾讯科技(深圳)有限公司 Method and device for recognizing microblog advertisement blog articles
CN106909669A (en) * 2017-02-28 2017-06-30 北京时间股份有限公司 The detection method and device of a kind of promotion message
CN107729489A (en) * 2017-10-17 2018-02-23 北京京东尚科信息技术有限公司 Advertisement text recognition methods and device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9426538B2 (en) * 2013-11-20 2016-08-23 At&T Intellectual Property I, Lp Method and apparatus for presenting advertising in content having an emotional context

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013049529A1 (en) * 2011-09-30 2013-04-04 Technicolor Usa Inc Method and apparatus for unsupervised learning of multi-resolution user profile from text analysis
CN103092975A (en) * 2013-01-25 2013-05-08 武汉大学 Detection and filter method of network community garbage information based on topic consensus coverage rate
CN103970801A (en) * 2013-02-05 2014-08-06 腾讯科技(深圳)有限公司 Method and device for recognizing microblog advertisement blog articles
CN106909669A (en) * 2017-02-28 2017-06-30 北京时间股份有限公司 The detection method and device of a kind of promotion message
CN107729489A (en) * 2017-10-17 2018-02-23 北京京东尚科信息技术有限公司 Advertisement text recognition methods and device

Also Published As

Publication number Publication date
CN110362680A (en) 2019-10-22

Similar Documents

Publication Publication Date Title
CN110516067B (en) Public opinion monitoring method, system and storage medium based on topic detection
CN108628833B (en) Method and device for determining summary of original content and method and device for recommending original content
CN110046260B (en) Knowledge graph-based hidden network topic discovery method and system
CN104881458B (en) A kind of mask method and device of Web page subject
CN106547875B (en) Microblog online emergency detection method based on emotion analysis and label
CN106708966A (en) Similarity calculation-based junk comment detection method
US20120029908A1 (en) Information processing device, related sentence providing method, and program
WO2015165408A1 (en) Method and system for filtering goods evaluation information
CN110134792B (en) Text recognition method and device, electronic equipment and storage medium
CN105183717B (en) A kind of OSN user feeling analysis methods based on random forest and customer relationship
CN106354845A (en) Microblog rumor recognizing method and system based on propagation structures
CN107944911B (en) Recommendation method of recommendation system based on text analysis
CN107577665B (en) Text emotional tendency judging method
KR20140111307A (en) User question processing method and system
CN109086375A (en) A kind of short text subject extraction method based on term vector enhancing
CN107133282B (en) Improved evaluation object identification method based on bidirectional propagation
CN110287314B (en) Long text reliability assessment method and system based on unsupervised clustering
CN110489745B (en) Paper text similarity detection method based on citation network
CN112069312B (en) Text classification method based on entity recognition and electronic device
Chauhan et al. Research on product review analysis and spam review detection
WO2016040772A1 (en) Method and apparatus of matching an object to be displayed
CN110019820A (en) Main suit and present illness history symptom Timing Coincidence Detection method in a kind of case history
CN105468780B (en) The normalization method and device of ProductName entity in a kind of microblogging text
CN110362680B (en) Soft-wide detection and advertisement extraction method based on graph network structure analysis
CN108427769B (en) Character interest tag extraction method based on social network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant