CN107193796B - Public opinion event detection method and device - Google Patents

Public opinion event detection method and device Download PDF

Info

Publication number
CN107193796B
CN107193796B CN201610197073.3A CN201610197073A CN107193796B CN 107193796 B CN107193796 B CN 107193796B CN 201610197073 A CN201610197073 A CN 201610197073A CN 107193796 B CN107193796 B CN 107193796B
Authority
CN
China
Prior art keywords
text
sensitive
feature
detected
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610197073.3A
Other languages
Chinese (zh)
Other versions
CN107193796A (en
Inventor
蔡慧慧
刘克松
张丹
于晓明
杨建武
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
New Founder Holdings Development Co ltd
Peking University
Beijing Founder Electronics Co Ltd
Original Assignee
Peking University
Peking University Founder Group Co Ltd
Beijing Founder Electronics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University, Peking University Founder Group Co Ltd, Beijing Founder Electronics Co Ltd filed Critical Peking University
Publication of CN107193796A publication Critical patent/CN107193796A/en
Application granted granted Critical
Publication of CN107193796B publication Critical patent/CN107193796B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Machine Translation (AREA)
  • Document Processing Apparatus (AREA)

Abstract

The invention discloses a public sentiment event detection method and a device, wherein the method comprises the following steps: acquiring a feature word vector of a text to be detected; obtaining vectors corresponding to all the feature words and obtaining sensitive meaning item vectors; calculating the similarity of the feature word vectors of the text to be detected and the feature word vectors corresponding to all the feature words; the method comprises the steps of obtaining corresponding first sensitive meaning items when the similarity is maximum, obtaining the number of the first sensitive meaning items in a text to be detected and the number of feature words in the text to be detected, calculating the weighted sum of the number of the first sensitive meaning items and the number of the feature words according to a first preset weight and a second preset weight, and determining that an event described in the text to be detected is a public sentiment event when the weighted sum is larger than a threshold value. The invention can achieve effective semantic constraint by vectorizing the text to be detected; meanwhile, the similarity of the feature word vectors of the text to be detected and the feature word vectors corresponding to all the feature words is calculated, so that the problem of the public sentiment event needing to be concerned can be accurately detected.

Description

Public opinion event detection method and device
Technical Field
The invention relates to the technical field of computers, in particular to a public sentiment event detection method and device.
Background
With the rapid development of the internet, the network public sentiment is becoming the interest expression of common people, advocating fair and impartial society, and uninterruptedly conveying the thought and position of common public minds to all levels of governments in China. More and more people are willing to release the idea and the phenomenon to be seen on the network, and more people are led in through the network transmission, thereby having great influence on the emotion and social stability of netizens. Therefore, the method has very important significance in accurately detecting public sentiment events by using modern scientific technology.
At present, the detection and discovery about public sentiment events still stay in semantic matching by using some public sentiment sensitive words, and public sentiment is only embodied when the named entity words associated with the public sentiment events, such as human names, foreign language human name translation names and mechanism names, appear in the context of the associated events. For named entities with duplicate names, the meaning of the named entities needs to be analyzed in combination with the current public sentiment event context, and for the class of ambiguous feature words, the traditional static corpus may not contain the latest explanatory meaning item for the ambiguous feature words. The traditional filtering method based on public sentiment characteristic words (sensitive words, named entities and the like) is still an important preprocessing means due to simple implementation mechanism and high execution efficiency; however, in the face of mass texts on the internet, especially fragmented and irregular social media contents, due to the lack of effective semantic constraints, the preprocessing and filtering mechanism has certain false positives, which easily cause misjudgment and missed judgment, and cannot accurately identify public sentiment events needing attention. Considerable noise data input is brought to subsequent processing in the application environment of network public opinion early warning of big data, so a data preprocessing mechanism with semantic understanding capability is urgently needed.
Disclosure of Invention
The invention provides a public opinion event detection method and device, which are used for solving the problems that a traditional feature word filtering method is lack of effective semantic constraints in the presence of massive internet texts, so that misjudgment and missed judgment are easily caused, and a public opinion event needing to be focused on cannot be accurately detected.
In a first aspect, the present invention provides a method for detecting a public sentiment event, including:
acquiring a feature word vector of a text to be detected, wherein elements of the feature word vector represent whether corresponding feature words appear in the text to be detected;
obtaining vectors corresponding to all feature words from a semantic knowledge base, and obtaining sensitive semantic item vectors from a sensitive word base, wherein elements of the vectors corresponding to the feature words comprise current feature words, whether the current feature words contain sensitive semantic items, current semantic items of the current feature words and feature word vectors corresponding to the current feature words, and the sensitive semantic item vectors represent that semantic items in the vectors corresponding to the current feature words are current sensitive semantic items;
calculating similarity between a feature word vector of a text to be detected and feature word vectors corresponding to all feature words, wherein the feature word vectors corresponding to all the feature words comprise all sensitive meaning item vectors;
the method comprises the steps of obtaining a corresponding first sensitive meaning item when the similarity is maximum, obtaining the number of the first sensitive meaning items in a text to be detected and the number of feature words in the text to be detected, calculating the weighted sum of the number of the first sensitive meaning items and the number of the feature words according to a first preset weight and a second preset weight, and determining that an event described in the text to be detected is a public sentiment event when the weighted sum is larger than a threshold value.
Preferably, the obtaining of the feature word vector of the text to be detected includes:
and constructing the semantic knowledge base according to the webpage content.
Preferably, the web page content is stored in an xml format file.
Preferably, the web page content is wikipedia.
Preferably, the building the semantic knowledge base according to the web page content comprises:
and establishing a sensitive word bank according to the semantic knowledge bank and the sensitive meaning items of the preset characteristic words.
In a second aspect, the present invention further provides a public sentiment event detecting apparatus, including:
the characteristic word vector acquisition module is used for acquiring a characteristic word vector of the text to be detected, wherein elements of the characteristic word vector indicate whether corresponding characteristic words appear in the text to be detected;
the corresponding vector acquisition module is used for acquiring vectors corresponding to all the feature words from the semantic knowledge base and acquiring sensitive semantic item vectors from the sensitive word base, wherein elements of the vectors corresponding to the feature words comprise current feature words, whether the current feature words contain sensitive semantic items, current semantic items of the current feature words and feature word vectors corresponding to the current feature words, and the sensitive semantic item vectors represent that semantic items in the vectors corresponding to the current feature words are current sensitive semantic items;
the similarity calculation module is used for calculating the similarity between the feature word vectors of the text to be detected and the feature word vectors corresponding to all the feature words, wherein the feature word vectors corresponding to all the feature words comprise all sensitive semantic item vectors;
the event detection module is used for acquiring a corresponding first sensitive semantic item when the similarity is maximum, and acquiring the number of the first sensitive semantic items in the text to be detected and the number of the feature words in the text to be detected; and calculating the weighted sum of the number of the first sensitive semantic items and the number of the characteristic words according to a first preset weight and a second preset weight, and determining that the event described in the text to be detected is a public sentiment event when the weighted sum is greater than a threshold value.
Preferably, the method further comprises the following steps:
and the semantic knowledge base construction module is used for constructing the semantic knowledge base according to the webpage content.
Preferably, the web page content is stored in an xml format file.
Preferably, the web page content is wikipedia.
Preferably, the method further comprises the following steps:
and the sensitive word bank establishing module is used for establishing a sensitive word bank according to the semantic knowledge bank and the sensitive meaning items of the preset characteristic words.
According to the technical scheme, effective semantic constraint can be achieved by vectorizing the text to be detected; meanwhile, by calculating the similarity of the feature word vector of the text to be detected and the feature word vectors corresponding to all the feature words, the problem of the public sentiment event needing to be concerned can be accurately detected, and the probability of erroneous judgment and missed judgment is greatly reduced.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
Fig. 1 is a flowchart illustrating a public sentiment event detection method according to an embodiment of the present invention;
fig. 2 is a flowchart illustrating a public sentiment event detection method according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a public sentiment event detection device according to an embodiment of the present invention.
Detailed Description
The following further describes embodiments of the invention with reference to the drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby.
Fig. 1 is a flowchart illustrating a public sentiment event detection method according to an embodiment of the present invention, including:
s101, obtaining a feature word vector of a text to be detected, wherein elements of the feature word vector indicate whether corresponding feature words appear in the text to be detected;
s102, obtaining vectors corresponding to all feature words from a semantic knowledge base, and obtaining sensitive semantic item vectors from a sensitive word base, wherein elements of the vectors corresponding to the feature words comprise current feature words, whether the current feature words contain sensitive semantic items, current semantic items of the current feature words and feature word vectors corresponding to the current feature words, and the sensitive semantic item vectors indicate that semantic items in the vectors corresponding to the current feature words are the current sensitive semantic items;
s103, calculating similarity between the feature word vectors of the text to be detected and the feature word vectors corresponding to all the feature words, wherein the feature word vectors corresponding to all the feature words comprise all sensitive meaning item vectors;
s104, acquiring corresponding first sensitive semantic items when the similarity is maximum, acquiring the number of the first sensitive semantic items in the text to be detected and the number of the feature words in the text to be detected, calculating the weighted sum of the number of the first sensitive semantic items and the number of the feature words according to a first preset weight and a second preset weight, and determining that the public sentiment of the event described in the text to be detected is the public sentiment event when the weighted sum is greater than a threshold value.
When the feature word corresponding to the element of the feature word vector is a sensitive word, the corresponding element may be set to 0.
The embodiment can achieve effective semantic constraint by vectorizing the text to be detected; meanwhile, by calculating the similarity of the feature word vector of the text to be detected and the feature word vectors corresponding to all the feature words, the problem of the public sentiment event needing to be concerned can be accurately detected, and the probability of erroneous judgment and missed judgment is greatly reduced.
As an alternative to this embodiment, step S101 includes, before:
s100, building the semantic knowledge base according to the webpage content.
The semantic knowledge base is built, the ambiguity marking is carried out on the public sentiment sensitive words, semantic support is provided for analyzing and detecting the public sentiment events, and basis is provided for finding correct meanings of the sensitive words in the text to be detected. The public sentiment characteristic words are often direct embodiments of the public sentiment, but the public sentiment characteristic words can represent different meanings in different contexts, so the ambiguous public sentiment characteristic words often bring false positive problems to text filtering preprocessing. Thus, by giving its description exactly with the help of this semantic knowledge base, the meaning it has expressed in a specific context can be recognized.
The vector corresponding to the feature word stored in the semantic knowledge base is obtained by training the text after word segmentation preprocessing by using a deep learning tool word2 vec. For each word segmentation (namely, the feature words in the text to be detected), the word segmentation can be effectively represented by a vector with a certain dimension. As shown in the following table
Figure BDA0000955259960000051
Figure BDA0000955259960000061
Specifically, the web page content is stored in an xml format file.
For example, the web page content is wikipedia.
Wikipedia (Wikipedia) is one of the largest-scale online network encyclopedias, adopts a Wiki mechanism of group online cooperation editing, has the characteristics of high quality, wide coverage, real-time evolution, semi-structuring and the like, and is a high-quality corpus source for constructing a semantic knowledge base. Particularly, for ambiguous words in Wikipedia, meaning items reflecting public sentiment characteristics are manually marked, and support is provided for subsequent early warning analysis. The method comprises the steps of taking Wikipedia linguistic data in an xml format as input, extracting description contents of words from the input, analyzing whether the words are ambiguous words and reorientation words and whether complex and simple conversion is needed, keeping an abstract introduction part, and labeling sensitive characteristic words.
By means of strong semantic knowledge of Wikipedia, public sentiment sensitive words can be automatically added, and the representation range of public sentiment events is expanded, so that users are assisted to better grasp public sentiment trends, and relevant countermeasures are made to deal with the public sentiment events.
Further, step S100 is followed by:
s1001, establishing a sensitive word bank according to the semantic knowledge bank and sensitive meaning items of preset feature words.
When the text to be detected is processed, sensitive words can be processed by taking the clauses as processing units. During specific processing, matching the characteristic words in the characteristic word vector of the text clause to be detected with the vectors corresponding to the characteristic words in the semantic knowledge base, and selecting the meaning item to be matched with the sensitive word if the similarity is higher by calculating the similarity between the meaning items of different characteristic words and the similarity with the text to be detected, so that the meaning item is closer to the real meaning of the meaning item in the text, and the accurate meaning of each ambiguous word in the text when the maximum value of the target function is obtained by using an optimization method. The calculation formula is as follows:
maxf(wi)
f(wi)=f(wi+1)+Sim(wi,wi+1)+Sim(wi,doci)
Figure BDA0000955259960000071
s.t.
wi∈{v1,v2…,vm}
doci=(w1,w2,…,wn),wi=0
wherein: w is aiRepresenting a feature word in the text to be detected, f (w)i) The expression wiSemantic similarity value to sentence end, dociThe vector representation of the text after removing the sensitive words is carried out, namely the element of the corresponding position is set as 0; v. of1,v2… … is the vector corresponding to the characteristic word, if the word is a non-ambiguous word, there is one vector representation, otherwise, there are multiple vector representations; sim (w)i,wi+1) Is a function of calculating the similarity of adjacent sensitive words, Sim (w)i,doci) Is a function that calculates the similarity of the sensitive word to the text. Because words and texts are represented by word vectors, similarity calculation functionA cosine similarity calculation method may be employed.
For example, when detecting a public sentiment event according to a text to be detected, as shown in fig. 2, a word segmentation and a stop word removal operation may be performed on the text to be detected, where the word segmentation means to segment a sentence in the text to be detected into a plurality of feature words, and the stop word removal means to delete the stop word in the text to be detected, such as "simultaneously", "additionally", and the like.
Then, obtaining a vector of a sensitive semantic item in the text to be detected from a semantic knowledge base and a sensitive word base by using word2vec, so that similarity calculation can be conveniently carried out on adjacent words in sentences of the text to be detected subsequently;
then, performing similarity calculation by using the sensitive semantic item vector of each feature word, vectors corresponding to other feature words and the feature word vector of the text to be detected, and taking the meaning of each sensitive semantic item when the similarity is maximum, so as to obtain sensitive semantic items which can be reasonably matched with other words and the text to be detected, and determine the specific meaning of the feature word in the text to be detected;
and finally, carrying out weight summation on the named entities and the sensitive meaning items in the text, and judging the public sentiment event needing early warning if the sum is greater than a certain threshold value. The named entity refers to the number of the feature words in the text to be detected.
In the embodiment, semantic recognition with supervised learning is performed by using different semantic items of the feature words and information labels of all the feature words in the text to be detected. The method can avoid the defect that the public sentiment event is subjected to error detection only by means of keyword matching, thereby accurately identifying the public sentiment event and carrying out early warning prompt on the public sentiment event needing early warning.
Fig. 3 is a schematic structural diagram of a public sentiment event detecting device according to an embodiment of the present invention, including:
the feature word vector obtaining module 31 is configured to obtain a feature word vector of the text to be detected, where an element of the feature word vector indicates whether a corresponding feature word in the text to be detected appears;
a corresponding vector obtaining module 32, configured to obtain vectors corresponding to all feature words from a semantic knowledge base, and obtain a sensitive semantic item vector from a sensitive word base, where elements of the vector corresponding to the feature words include a current feature word, whether the current feature word includes a sensitive semantic item, a current semantic item of the current feature word, and a feature word vector corresponding to the current feature word, and the sensitive semantic item vector indicates that a semantic item in the vector corresponding to the current feature word is a current sensitive semantic item;
the similarity calculation module 33 is configured to calculate similarities between feature word vectors of the text to be detected and feature word vectors corresponding to all feature words, where the feature word vectors corresponding to all feature words include all sensitive semantic item vectors;
the event detection module 34 is configured to obtain a corresponding first sensitive semantic item when the similarity is maximum, and obtain the number of the first sensitive semantic items in the text to be detected and the number of the feature words in the text to be detected; and calculating the weighted sum of the number of the first sensitive semantic items and the number of the characteristic words according to a first preset weight and a second preset weight, and determining that the event described in the text to be detected is a public sentiment event when the weighted sum is greater than a threshold value.
The embodiment can achieve effective semantic constraint by vectorizing the text to be detected; meanwhile, by calculating the similarity of the feature word vector of the text to be detected and the feature word vectors corresponding to all the feature words, the problem of the public sentiment event needing to be concerned can be accurately detected, and the probability of erroneous judgment and missed judgment is greatly reduced.
As an alternative of this embodiment, the method further includes:
and the semantic knowledge base construction module is used for constructing the semantic knowledge base according to the webpage content.
Specifically, the web page content is stored in an xml format file.
For example, the web page content is wikipedia.
Further, still include:
and the sensitive word bank establishing module is used for establishing a sensitive word bank according to the semantic knowledge bank and the sensitive meaning items of the preset characteristic words.
In the description of the present invention, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Claims (10)

1. A public opinion event detection method is characterized by comprising the following steps:
acquiring a feature word vector of a text to be detected, wherein elements of the feature word vector represent whether corresponding feature words appear in the text to be detected;
obtaining vectors corresponding to all the feature words from a semantic knowledge base, wherein elements of the vectors corresponding to the feature words comprise the current feature words, whether the current feature words contain sensitive semantic items, the current semantic items of the current feature words and feature word vectors corresponding to the current feature words; acquiring a sensitive sense item vector from a sensitive word bank, wherein the sensitive sense item vector indicates that a sense item in a vector corresponding to the current feature word is a preset feature word sensitive sense item; the sensitive word bank is established based on the semantic knowledge bank and sensitive meaning items of preset characteristic words; the vectors corresponding to the sensitive characteristic words in all the characteristic words of the semantic knowledge base have ambiguous labels;
calculating the overall similarity of the feature word vectors of the text to be detected and the feature word vectors corresponding to all the feature words, wherein the feature word vectors corresponding to all the feature words comprise all sensitive meaning item vectors obtained from a sensitive word bank; and the calculating the overall similarity of the feature word vectors of the text to be detected and the feature word vectors corresponding to all the feature words specifically comprises: calculating the similarity among a plurality of feature word vectors and calculating the sum of the similarity of each feature word and a background text vector to serve as the overall similarity;
wherein the overall similarity function f (w)i) The correlation calculation formula of (2) is as follows:
max f(wi)
f(wi)=f(wi+1)+Sim(wi,wi+1)+Sim(wi,doci)
Figure FDA0003326838770000011
s.t.
wi∈{v1,v2…,vm}
doci=(w1,w2,…,wn),wi=0
wherein, wiRepresenting feature word vectors, v, in the text to be examined1,v2… … is a characteristic word vector corresponding to each word in the text to be detected, if the word is a non-ambiguous word, there is one vector representation, otherwise, there are multiple vector representations; f (w)i) Representing the total similarity of the feature word vectors of the text to be detected and the feature word vectors corresponding to all the feature words; sim (w)i,wi+1) Representing the similarity between every two adjacent feature word vectors; sim (w)i,doci) Representing the similarity between each feature word and a background text vector; therein, dociThe method is a background text vector, and is used for vector representation of the background text of the text to be detected after sensitive characteristic words are removed, namely the vector representation of the background text when an element containing a corresponding position of a sensitive meaning item is set to be 0;
the method for acquiring the first sensitive meaning item corresponding to the feature word vector of the text to be detected when the overall similarity function of the text to be detected takes the maximum value comprises the following steps: acquiring a first sensitive meaning item corresponding to a feature word vector of a text to be detected when a total similarity function takes a maximum value through an optimization method based on ambiguity marks in a semantic knowledge base so as to determine the accurate meaning of an ambiguous word in the text;
repeatedly executing the steps to obtain corresponding first sensitive meaning items respectively for a plurality of sensitive characteristic words in the text to be detected; according to the obtained first sensitive semantic items, the number of the first sensitive semantic items in the text to be detected and the number of the feature words in the text to be detected are obtained, according to a first preset weight and a second preset weight, the weighted sum of the number of the first sensitive semantic items and the number of the feature words is calculated, and when the weighted sum is larger than a threshold value, the event described in the text to be detected is determined to be a public sentiment event.
2. The method according to claim 1, wherein the obtaining the feature word vector of the text to be detected comprises:
and constructing the semantic knowledge base according to the webpage content.
3. The method of claim 2, wherein the web page content is stored in an xml-format file.
4. The method of claim 3, wherein the web page content is wikipedia.
5. The method of claim 4, wherein the building the semantic knowledge base according to the web page content comprises:
and establishing a sensitive word bank according to the semantic knowledge bank and preset feature word sensitive semantic items.
6. A public opinion event detection device, comprising:
the characteristic word vector acquisition module is used for acquiring a characteristic word vector of the text to be detected, wherein elements of the characteristic word vector indicate whether corresponding characteristic words appear in the text to be detected;
a corresponding vector acquisition module, configured to acquire vectors corresponding to all feature words from a semantic knowledge base, where elements of the vectors corresponding to the feature words include a current feature word, whether the current feature word includes a sensitive semantic item, a current semantic item of the current feature word, and a feature word vector corresponding to the current feature word; acquiring a sensitive sense item vector from a sensitive word bank, wherein the sensitive sense item vector indicates that a sense item in a vector corresponding to the current feature word is a preset feature word sensitive sense item; the sensitive word bank is established based on the semantic knowledge bank and sensitive meaning items of preset characteristic words; the vectors corresponding to the sensitive characteristic words in all the characteristic words of the semantic knowledge base have ambiguous labels;
the similarity calculation module is used for calculating the total similarity of the feature word vectors of the text to be detected and the feature word vectors corresponding to all the feature words, wherein the feature word vectors corresponding to all the feature words comprise all sensitive semantic item vectors acquired from a sensitive word bank; and the calculating the overall similarity of the feature word vectors of the text to be detected and the feature word vectors corresponding to all the feature words specifically comprises: calculating the similarity among a plurality of feature word vectors and calculating the sum of the similarity of each feature word and a background text vector to serve as the overall similarity;
wherein the overall similarity function f (w)i) The correlation calculation formula of (2) is as follows:
maxf(wi)
f(wi)=f(wi+1)+Sim(wi,wi+1)+Sim(wi,doci)
Figure FDA0003326838770000031
s.t.
wi∈{v1,v2…,vm}
doci=(w1,w2,…,wn),wi=0
wherein, wiRepresenting feature word vectors, v, in the text to be examined1,v2… … is a characteristic word vector corresponding to each word in the text to be detected, if the word is a non-ambiguous word, there is one vector representation, otherwise, there are multiple vector representations; f (w)i) Representing the total similarity of the feature word vectors of the text to be detected and the feature word vectors corresponding to all the feature words; sim (w)i,wi+1) Representing the similarity between every two adjacent feature word vectors; sim (w)i,doci) Representing the similarity between each feature word and a background text vector; therein, dociIs a background text vector which is the vector representation of the background text of the text to be detected without the sensitive characteristic words, namely whether the background text contains the corresponding positions of the sensitive meaning items or notThe vector representation of the background text when the element of (a) is set to 0;
the event detection module is used for acquiring a first sensitive meaning item corresponding to the feature word vector when the overall similarity function of the text feature body to be detected takes the maximum value, and comprises the following steps: acquiring a first sensitive meaning item corresponding to a feature word vector of a text to be detected when a total similarity function takes a maximum value through an optimization method based on ambiguity marks in a semantic knowledge base so as to determine the accurate meaning of an ambiguous word in the text;
the event detection module is also used for repeatedly executing the steps to respectively obtain the corresponding first sensitive meaning items of a plurality of sensitive characteristic words in the text to be detected; according to the obtained first sensitive semantic item, acquiring the number of the first sensitive semantic items in the text to be detected and the number of the feature words in the text to be detected; and calculating the weighted sum of the number of the first sensitive semantic items and the number of the characteristic words according to a first preset weight and a second preset weight, and determining that the event described in the text to be detected is a public sentiment event when the weighted sum is greater than a threshold value.
7. The apparatus of claim 6, further comprising:
and the semantic knowledge base construction module is used for constructing the semantic knowledge base according to the webpage content.
8. The apparatus of claim 7, wherein the web page content is stored in an xml format file.
9. The apparatus of claim 8, wherein the web page content is wikipedia.
10. The apparatus of claim 9, further comprising:
and the sensitive word bank establishing module is used for establishing a sensitive word bank according to the semantic knowledge bank and the preset characteristic word sensitive semantic item.
CN201610197073.3A 2016-03-14 2016-03-31 Public opinion event detection method and device Active CN107193796B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN2016101447613 2016-03-14
CN201610144761 2016-03-14

Publications (2)

Publication Number Publication Date
CN107193796A CN107193796A (en) 2017-09-22
CN107193796B true CN107193796B (en) 2021-12-24

Family

ID=59870838

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610197073.3A Active CN107193796B (en) 2016-03-14 2016-03-31 Public opinion event detection method and device

Country Status (1)

Country Link
CN (1) CN107193796B (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107992471B (en) * 2017-11-10 2021-09-10 北京光年无限科技有限公司 Information filtering method and device in human-computer interaction process
CN108647335A (en) * 2018-05-12 2018-10-12 苏州华必讯信息科技有限公司 Internet public opinion analysis method and apparatus
CN109214407B (en) * 2018-07-06 2022-04-19 创新先进技术有限公司 Event detection model, method and device, computing equipment and storage medium
CN109472018A (en) * 2018-09-26 2019-03-15 深圳壹账通智能科技有限公司 Enterprise's public sentiment monitoring method, device, computer equipment and storage medium
CN109344258B (en) * 2018-11-28 2021-11-12 中国电子科技网络信息安全有限公司 Intelligent self-adaptive sensitive data identification system and method
CN110674251A (en) * 2019-08-21 2020-01-10 杭州电子科技大学 Computer-assisted secret point annotation method based on semantic information
CN110516166B (en) * 2019-08-30 2022-10-25 北京明略软件系统有限公司 Public opinion event processing method, device, processing equipment and storage medium
CN110727880B (en) * 2019-10-18 2022-06-17 西安电子科技大学 Sensitive corpus detection method based on word bank and word vector model
CN110807319B (en) * 2019-10-31 2023-07-25 北京奇艺世纪科技有限公司 Text content detection method, detection device, electronic equipment and storage medium
CN113505221B (en) * 2020-03-24 2024-03-12 国家计算机网络与信息安全管理中心 Enterprise false propaganda risk identification method, equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102096680A (en) * 2009-12-15 2011-06-15 北京大学 Method and device for analyzing information validity
CN103605691A (en) * 2013-11-04 2014-02-26 北京奇虎科技有限公司 Device and method used for processing issued contents in social network
CN103605692A (en) * 2013-11-04 2014-02-26 北京奇虎科技有限公司 Device and method used for shielding advertisement contents in ask-and-answer community
CN104820629A (en) * 2015-05-14 2015-08-05 中国电子科技集团公司第五十四研究所 Intelligent system and method for emergently processing public sentiment emergency
CN104899230A (en) * 2014-03-07 2015-09-09 上海市玻森数据科技有限公司 Public opinion hotspot automatic monitoring system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102096680A (en) * 2009-12-15 2011-06-15 北京大学 Method and device for analyzing information validity
CN103605691A (en) * 2013-11-04 2014-02-26 北京奇虎科技有限公司 Device and method used for processing issued contents in social network
CN103605692A (en) * 2013-11-04 2014-02-26 北京奇虎科技有限公司 Device and method used for shielding advertisement contents in ask-and-answer community
CN104899230A (en) * 2014-03-07 2015-09-09 上海市玻森数据科技有限公司 Public opinion hotspot automatic monitoring system
CN104820629A (en) * 2015-05-14 2015-08-05 中国电子科技集团公司第五十四研究所 Intelligent system and method for emergently processing public sentiment emergency

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
A Graph Analytical Approach for Topic Detection;HASSAN SAYYADI et al.;《ACM Transactions on Internet Technology》;20131231;第13卷(第2期);第1-23页 *
面向公共危机预警的网络舆情分析研究;曹坚峰;《中国博士学位论文全文数据库-信息科技辑》;20140515;第79-103、第129-134页 *

Also Published As

Publication number Publication date
CN107193796A (en) 2017-09-22

Similar Documents

Publication Publication Date Title
CN107193796B (en) Public opinion event detection method and device
CN110516067B (en) Public opinion monitoring method, system and storage medium based on topic detection
JP5936698B2 (en) Word semantic relation extraction device
CN110727880B (en) Sensitive corpus detection method based on word bank and word vector model
Gokul et al. Sentence similarity detection in Malayalam language using cosine similarity
CN112069312B (en) Text classification method based on entity recognition and electronic device
CN111222330B (en) Chinese event detection method and system
Golshan et al. A study of recent contributions on information extraction
CN108763192B (en) Entity relation extraction method and device for text processing
CN112183102A (en) Named entity identification method based on attention mechanism and graph attention network
CN104317882A (en) Decision-based Chinese word segmentation and fusion method
CN111209373A (en) Sensitive text recognition method and device based on natural semantics
Badam et al. Aletheia: A fake news detection system for Hindi
EP3835994A1 (en) System and method for identification and profiling adverse events
Oo et al. An analysis of ambiguity detection techniques for software requirements specification (SRS)
Hussain et al. A technique for perceiving abusive bangla comments
Aejas et al. Named entity recognition for cultural heritage preservation
Nongmeikapam et al. Verb based manipuri sentiment analysis
Lakshmi et al. Named entity recognition in Malayalam using fuzzy support vector machine
Ajees et al. A named entity recognition system for Malayalam using conditional random fields
Li et al. Attention-based LSTM-CNNs for uncertainty identification on Chinese social media texts
Chang et al. Zero pronoun identification in chinese language with deep neural networks
Sun et al. Generalized abbreviation prediction with negative full forms and its application on improving chinese web search
Orellana et al. Evaluating named entities recognition (NER) tools vs algorithms adapted to the extraction of locations
Pham Sensitive keyword detection on textual product data: an approximate dictionary matching and context-score approach

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20230619

Address after: 3007, Hengqin international financial center building, No. 58, Huajin street, Hengqin new area, Zhuhai, Guangdong 519031

Patentee after: New founder holdings development Co.,Ltd.

Patentee after: Peking University

Patentee after: BEIJING FOUNDER ELECTRONICS Co.,Ltd.

Address before: 100871, fangzheng building, 298 Fu Cheng Road, Beijing, Haidian District

Patentee before: PEKING UNIVERSITY FOUNDER GROUP Co.,Ltd.

Patentee before: Peking University

Patentee before: BEIJING FOUNDER ELECTRONICS Co.,Ltd.

TR01 Transfer of patent right