CN110020436A - A kind of microblog emotional analytic approach of ontology and the interdependent combination of syntax - Google Patents
A kind of microblog emotional analytic approach of ontology and the interdependent combination of syntax Download PDFInfo
- Publication number
- CN110020436A CN110020436A CN201910276686.XA CN201910276686A CN110020436A CN 110020436 A CN110020436 A CN 110020436A CN 201910276686 A CN201910276686 A CN 201910276686A CN 110020436 A CN110020436 A CN 110020436A
- Authority
- CN
- China
- Prior art keywords
- ontology
- word
- emotion
- dimension
- weight
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000002996 emotional effect Effects 0.000 title claims abstract description 19
- 238000013459 approach Methods 0.000 title claims abstract description 11
- 230000008451 emotion Effects 0.000 claims abstract description 79
- 238000004458 analytical method Methods 0.000 claims description 21
- 238000000034 method Methods 0.000 claims description 14
- 238000011156 evaluation Methods 0.000 claims description 13
- 238000010276 construction Methods 0.000 claims description 11
- 230000014509 gene expression Effects 0.000 claims description 10
- 238000004364 calculation method Methods 0.000 claims description 9
- 239000004615 ingredient Substances 0.000 claims description 9
- 238000013519 translation Methods 0.000 claims description 9
- KRTSDMXIXPKRQR-AATRIKPKSA-N monocrotophos Chemical compound CNC(=O)\C=C(/C)OP(=O)(OC)OC KRTSDMXIXPKRQR-AATRIKPKSA-N 0.000 claims description 6
- 238000012986 modification Methods 0.000 claims description 5
- 230000004048 modification Effects 0.000 claims description 5
- 238000005516 engineering process Methods 0.000 claims description 4
- 238000006243 chemical reaction Methods 0.000 claims description 3
- 238000000151 deposition Methods 0.000 claims description 3
- 239000003607 modifier Substances 0.000 claims description 3
- 239000000284 extract Substances 0.000 claims description 2
- 238000005034 decoration Methods 0.000 claims 1
- 238000013480 data collection Methods 0.000 abstract description 3
- 238000007635 classification algorithm Methods 0.000 abstract description 2
- 239000003814 drug Substances 0.000 description 8
- 239000007788 liquid Substances 0.000 description 7
- 239000004577 thatch Substances 0.000 description 6
- 238000010586 diagram Methods 0.000 description 4
- 230000002688 persistence Effects 0.000 description 3
- 238000007781 pre-processing Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 238000002360 preparation method Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- 230000000052 comparative effect Effects 0.000 description 1
- 230000009193 crawling Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000003860 storage Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/01—Social networking
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Business, Economics & Management (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Resources & Organizations (AREA)
- General Business, Economics & Management (AREA)
- Tourism & Hospitality (AREA)
- Strategic Management (AREA)
- Primary Health Care (AREA)
- Marketing (AREA)
- Economics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Animal Behavior & Ethology (AREA)
- Computing Systems (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses the microblog emotional analytic approach of a kind of ontology and the interdependent combination of syntax, comprising the following steps: the relevant ontology of semi-automatic building theme, and ontology is persisted to database;Ontology is expanded and updated in terms of ontology dimension and emotion vocabulary two using syntax dependence;Emotion weight computing is carried out to micro-blog information using ontology, determines Sentiment orientation.It is compared with conventional machines learning classification algorithm, the present invention has feasibility and superiority on Chinese microblog data collection.
Description
Technical field
The invention belongs to text emotion analysis technical field, in particular to the microblogging feelings of a kind of ontology and the interdependent combination of syntax
Feel analytic approach.
Technical background
With popularizing for mobile Internet, microblogging is as social platform, on the basis of possessing a large number of users, has become
The most fast informed source of hot news event.Since the viscosity of user is high, microblogging contains the daily information of netizen of magnanimity, wherein
Including the in-service evaluation for each product.And because of some reasons, the evaluation data of product itself on-line shop are not objective enough, otherwise because
For the routine of microblogging, user's evaluation is more objective, has more tap value.Therefore it for enterprise, obtains and uses from microblogging
Family is to the evaluation of product and is subject to sentiment analysis, is the Information base of business decision indispensability.
For microblog data based on text data, the Sentiment orientation analysis for text data is the hot spot studied in recent years,
It is broadly divided into machine learning and ontological analysis two ways.Classifier is based on artificial constructed more in method based on machine learning,
When being directed to large data collection, modeling process is excessively complicated and tediously long, and manual operation is more difficult.To solve the above-mentioned problems, ontology
Construction method be suggested.Ontology is a kind of formalization, for sharing the clear of concept system and being described in detail, its energy
It is enough that concept is described from semantic level.It is above-mentioned based on the sentiment analysis of ontology after ontology initial construction, not will be updated this
Body, it is excessively high to the accuracy requirement of initial construction during realization, facts proved that the dimension of ontology can with the expansion of data and
Increase.
Summary of the invention
The invention proposes the interdependent microblog emotional analysis methods combined of a kind of ontology and syntax, it is therefore an objective to more accurately
Related emotion information is obtained from microblogging.Its original body of building semi-automatic for micro-blog information, then according to relevant text
Data automate update and optimization ontology using syntax dependency parsing principle in terms of product dimension, emotion vocabulary two, thus
Obtain mature ontology.Mature ontology is borrowed again, using new emotion weight calculation method proposed by the present invention, measures text data
Emotion weight and tendentiousness, to be accurately realized sentiment analysis.
Technical scheme is as follows:
A kind of microblog emotional analytic approach of ontology and the interdependent combination of syntax, comprising the following steps:
Step (1): the relevant ontology of semi-automatic building theme, and ontology is persisted to database;
Step (2): ontology is carried out in terms of ontology dimension and emotion vocabulary two using syntax dependence expand and
It updates;
Step (3): emotion weight computing is carried out to micro-blog information using ontology, determines Sentiment orientation.
Further, the step (1) specifically:
Step (1.1): ontology is constructed using seven footwork conventional construction methods by Prot é g é software: clearly building ontology
Fields scope;A possibility that considering multiplexing ontology;Display field important terms;Define class and its hierarchical system;Define class
Attribute;The facet of defined attribute;Create example;
Step (1.2): using Jena packet by ontology translation at database, data are extracted from semantic level, and converted
Acquisition source for model data is database or file.
Further, the process of the conversion in step (1.2) is as follows:
1. installing necessary software and configuring exploitation environment Eclipse+MySQL Server5.5-win32+
Jena2.6.4+protege5.1.0+mysql-connector-java-5.1.35 (JDBC of MySQL);
2. building product ontology with protege5.1.0, and actively generate OWL ontology file;
3. creating a database using MySQL;
4. opening Eclipse, a Java engineering is created;
5. while new construction, being directed respectively into Jena packet and the JDBC of MySQL;
6. creating a java class, name military_ontology.java under engineering catalogue;
7. starting to write code in military_ontology.java and run;
8. being successfully database by ontology translation;
7 tables can be generated after converting original body successfully using Jena, jena_g1t1_stmt is storage body contents
Table.
Further, the step (2) specifically:
Step (2.1): it is extended by ontology dimension of the syntax dependency parsing technology to product ontology: being existed in sentence
Predicate verb is as the center for dominating other ingredients, and predicate verb itself is not dominated by other ingredients, and subject ingredient is with certain
Kind of dependence is subordinated to dominator, deposit syntactic structure be using dependence as essential element, i.e. word to binary crelation group,
In binary crelation, dominator is known as core word, and subordinate is known as interdependent word, uses Stanford Parser syntax dependency parsing
Device carries out syntactic analysis:
Stanford Parser selects syntactic relation with typing dependence to be extended, in extension dimension
When concern include keyword two relational expressions, i.e. what nn and assmod, nn were indicated is noun combining form, and assmod indicates to close
Connection modification, the dependence based on two noun phrases;For the subordinate's dimension newly obtained, it may be found that good relationship is stored in ontology
Database steps are as follows: it is class that new dimension type is arranged first, is then classified as the subclass of corresponding father's dimension;
Step (2.2): as follows for the extended mode of emotion vocabulary: to utilize Stanford Parser, rely on and divide in syntax
On the basis of analysis, expanding emotion vocabulary and pay close attention to other two relational expression, i.e. amod and nsubj, amod indicates adjective modifier,
Adjective before i.e. common noun, nsubj indicates nominal subject, for indicating the connection between subject and object;Emotion word
The step of remittance belongs to example, is inserted into ontology database is as follows: it is NamedIndividual that its type is arranged first, then according to
Description classification is classified as the emotion vocabulary of the category, its emotion weight is finally inserted into database, and emotion weight is from emotion word
It is obtained in allusion quotation.
Further, the step (3) specifically:
Talking about the calculation formula that emotion weight uses to every is:
Wherein n is the emotion word number in short including, PriiNegative word weight is referred to, if when calculating emotion weight
Word i is the word of negative word modification, then the weight for being multiplied by negative word, generally negative is needed, if in the weight dictionary of negative word
Do not include, is then defaulted as -1;ValueiIt refers to the emotion weight of word itself, derives from emotion weight dictionary;
DimeniThe weight of dimension, calculation formula are as follows where indicating i-th of word:
Dimeni=Perclass_i*Perwords_i
Wherein Perclass_iThe quantitative proportion that subordinate's class of dimension where referring to i-th of word accounts in whole class,
Perwords_iThe quantitative proportion that the emotion assessment word number of dimension where referring to i-th of word accounts in total evaluation word;
In the SPARQL query language inquiring dimension class and being carried using ontology when emotion word, using Jena packet interface from
It has changed into the ontology of database and has extracted data relevant with class, example using SPARQL sentence;
TIiRefer to the TF*IDF weight of word, calculation formula is as follows:
TfijRefer to the tf value of word, for indicating ratio that some word occurs in current document, wherein molecule indicates single
Word tiThe number occurred in document j, denominator indicate the sum of all word numbers of document j;IdfijIt is word idf value, referred to as inversely
Document-frequency, refer to total number of files mesh divided by the number of files comprising keyword, then take that logarithm obtains as a result, wherein molecule
Indicate total number of files, denominator indicates to include word tiThe sum of number of files, in order to guarantee denominator forever just, denominator part adds 1;
Using ontology carry SPARQL query statement, word matching is directly carried out for each sentence, with find its
Dimension and emotional category and weight in ontology, then emotion weight can be calculated with above-mentioned formula.
Detailed description of the invention
Fig. 1 is ontology and syntax is interdependent combines microblog emotional analysis flow chart diagram.
Fig. 2 is great thatch liquid medicine body part display diagram.
Fig. 3 is the calculated affection index of the present invention and SVM and Naive Bayes Classifier classifying quality comparison diagram.
Specific embodiment
To make those skilled in the art more fully understand technical solution of the present invention, below to one kind provided by the invention
Ontology and the microblog emotional analytic approach of the interdependent combination of syntax are described in detail.Following embodiment be merely to illustrate the present invention rather than
For limiting the scope of the invention.
Embodiment
A kind of ontology and the interdependent microblog emotional analysis method combined of syntax, comprising the following steps:
1, the pre-processing of microblog data
The microblog data crawled is needed to carry out pre-processing, is specifically included that
(1) unified Chinese and English punctuation mark, unified full-shape and DBC case;
(2) emoticon is converted directly into corresponding Chinese;
(3) redundancy about reply such as removal " reply: " " it is good to reply weather: " " it is good to reply@weather: ";
(4) remove additional character " | (|) | $ | Shu | " | " | △ | ▲ | ▼ | ▍ | ■ etc.;
(5) remove in addition to,.!Other punctuation marks of equal tables segmentation sentence;
(6) it is segmented using stammerer and carries out vocabulary segmentation;
(7) stop words is removed.
2, the creation and persistence of original body
(1) the semi-automatic creation of ontology
The present invention uses seven footwork conventional construction methods in building.Seven footworks are clear with respect to other methods step, logic
Succinctly, easily operated.Seven footworks are developed by Stanford University Medical institute, are a kind of more common body constructing methods.It seven
A step is respectively: clearly building ontology fields scope;A possibility that considering multiplexing ontology;Display field important terms;
Define class and its hierarchical system;Define the attribute of class;The facet of defined attribute;Create example.
The present invention uses the semi-automatic building ontology of tool Prot é g é (Stanford University, 1999).Protégé
Software is the ontology construction tool of Stanford University Medical institute biological information research center exploitation.It is write based on Java language,
Belong to open-source software.Prot é g é is that user shields specific ontology description language, and user is not required to specifically learn ontology
Write language, need to only be described using the shortcut of software offer.
(2) persistence of ontology
For the ontology of semi-automatic building, need constantly to expand and modify ontology in data processing, it is therefore necessary to
By ontology persistence, change is easily processed, and the present invention is using Jena packet (HP Labs, 2009) by ontology translation at database.
Jena is a Java Open Framework, is mainly used for extracting data from semantic level, and be translated into model.And data obtain
The source of fetching can be database or file etc..If it is desired to inquiring data in semantic model, Jena also provides query language, i.e.,
SPARQL。
The present invention is using jena packet by ontology translation at database.The process of conversion is as follows:
1. installing necessary software and configuring exploitation environment Eclipse+MySQL Server5.5-win32+
Jena2.6.4+protege5.1.0+mysql-connector-java-5.1.35 (JDBC of MySQL);
2. building product ontology with protege5.1.0, and actively generate OWL ontology file;
3. creating a database using MySQL;
4. opening Eclipse, a Java engineering is created;
5. while new construction, being directed respectively into Jena packet and the JDBC of MySQL;
6. creating a java class, name military_ontology.java under engineering catalogue;
7. starting to write code in military_ontology.java and run;
8. being successfully database by ontology translation.
7 tables can be generated after converting original body successfully using Jena, ontology information is stored in jena_g1t0_reif
In table, other tables are without concern.
3, the ontology expansion based on syntax dependence
The present invention realizes that product ontology extends automatically using the dependence based on Chinese syntax.Principal concern is user
Syntax in comment removes the correspondence descriptor or junior's attribute of discovery product itself or product dimension using syntactic relation.
Because application is product ontology, the evaluation index of product and product dimension need to be only paid close attention in automatic extension, without considering
Uncorrelated vocabulary.The automatic extension of ontology mainly includes two aspects.
(1) each dimension of ontology is extended
The ontology dimension of product ontology is extended.Due to that cannot be completely secured when initially setting up ontology comprehensively,
So needing gradually to extend ontology with the processing of data.The technology that extension ontology dimension is mainly used is syntax dependency parsing.
Syntax is interdependent to be proposed by French linguist Tesiniere in nineteen fifty-nine.The core concept of method is: depositing in sentence
In predicate verb as the center for dominating other ingredients, and predicate verb itself is not dominated by other ingredients, subject ingredient with
Certain dependence is subordinated to dominator.Dependency grammar structure is using dependence as essential element, i.e., word is to binary crelation
Group.In binary crelation, dominator is known as core word, and subordinate is known as interdependent word.Dependence just reflect core word and according to
Deposit the semantic dependency relationship between word.
The present invention carries out syntactic analysis using Stanford Parser (Stanford Univ-ersity, 2002).
Stanford Parser is by the parser of Stanford University's natural language processing group development, is a height optimization
Probability context-free grammar and Lexical dependency analysis device, principle is from probability statistics.There is the JAVA of open source real at present
Existing software package can be used, multi-lingual including support English, Chinese, German.
Stanford Parser is defeated with the various ways such as parsing tree and typing dependence for syntactic relation
Out, the present invention selects typing dependence to be extended selection.There are many dependences that software provides, and the present invention is expanding
Two relational expressions comprising keyword, i.e. nn and assmod are primarily upon when opening up dimension.What nn was indicated is noun combining form, than
As being " advertising cost " when getting one group of nn relational expression, then it would know that " cost " is subordinate's dimension of " advertisement ";Assmod table
Show association modification, be mainly based upon the dependence of two noun phrases, for example when getting one group of assmod relational expression is " medicine
Wine advertisement " then would know that " advertisement " is subordinate's dimension of " liquid medicine ".
For the subordinate's dimension newly obtained, it may be found that steps are as follows for good relationship deposit ontology database: setting is new first
Dimension type is class, is then classified as the subclass of corresponding father's dimension.Table 1 illustrates new dimension extension and is inserted into data
Library needs increased entry.
The extension of the new dimension of table 1 is inserted into database
(2) expand emotion vocabulary
Universal emotion vocabulary can be added when ontology initial construction, however when analyzing specific product, for difference
Dimension needs different dimension emotion vocabulary, these words will be obtained from real data mostly, more difficult in early-stage preparations
It collects more comprehensive.
Expand emotion vocabulary mode be it is identical as a upper section, utilize Stanford Parser.In syntax dependency analysis
On the basis of, expand emotion vocabulary and focuses more on other two relational expression, i.e. amod and nsubj.Amod indicates adjective modifier,
Adjective before i.e. common noun, such as amod relational expression are " sham publicity ", then would know that " falseness " is the emotion of " advertisement "
Vocabulary;Nsubj indicates nominal subject, is mainly used for indicating that the connection between subject and object, such as nsubj relational expression are
" statement is shameless ", then would know that " shamelessness " is the emotion vocabulary of " statement ".
The step of emotion vocabulary belongs to example, is inserted into ontology database is as follows: its type is arranged first is
Then NamedIndividual is classified as the emotion vocabulary of the category according to description classification, is finally inserted into its emotion weight
Database, emotion weight are obtained from sentiment dictionary.Table 2, which illustrates new dimension emotion vocabulary and is inserted into database, to be needed to increase
Entry.
2 dimension emotion vocabulary of table is inserted into database
4, emotion weight computing
After ontology expansion updates completion, need to carry out sentiment analysis to data.Traditional sentiment analysis is mostly emotion power
Value is directly added, and the method error is too big.In order to avoid such case, the present invention utilizes the emotion weight based on ontology dimension point
Analysis method.This method takes into account the dimension index of emotion word in the body when calculating emotion weight, can be more fully
React the effect of emotion word.
Before affection computation, early-stage preparations, the introducing of mainly a variety of dictionaries have been carried out.Include: emotion weight dictionary,
Negative word dictionary and synonymicon.Emotion weight dictionary select Chinese Language Department, Tsinghua University sentiment dictionary, it includes be word
The emotion weight of itself.Negative word dictionary selects the negative dictionary that uses of Jiangsu University of Science and Technology, negate dictionary effect be in order to
Weight is negated, if there are negative word before word, that subsequent word weight should be turned.Synonymicon, which is selected, to be breathed out
Work great society calculates and Research into information retrieval center Chinese thesaurus, is to use to expand ontology, the synonym of each word can
To expand as similar dimension into ontology, in this way when searching or judging dimension with regard to more acurrate;Secondly, when in emotion dictionary
It when weight not comprising some word, can use all synonym weights of the word, its average value taken to weigh as the emotion of the word
Value.
Talking about the calculation formula that emotion weight uses to every is:
Wherein n is the emotion word number in short including, DimeniThe weight of dimension where indicating i-th of word, calculates
Formula is as follows:
Dimeni=Perclass_i*Perwords_i
Wherein Perclass_iThe quantitative proportion that subordinate's class of dimension where referring to i-th of word accounts in whole class,
Perwords_iThe quantitative proportion that the emotion assessment word number of dimension where referring to i-th of word accounts in total evaluation word.
The SPARQL query language that need to only ontology is used to carry when inquiring dimension class and emotion word, Jena packet provide
Interface, can extract data relevant with class, example using SPARQL sentence from the ontology for changed into database.
TIiRefer to the TF*IDF weight of word, calculation formula is as follows:
TfijRefer to the tf value of word, for indicating ratio that some word occurs in current document, wherein molecule indicates single
Word tiThe number occurred in document j, denominator indicate the sum of all word numbers of document j;IdfijIt is word idf value, referred to as inversely
Document-frequency, refer to total number of files mesh divided by the number of files comprising keyword, then take that logarithm obtains as a result, wherein molecule
Indicate total number of files, denominator indicates to include word tiThe sum of number of files.In order to guarantee denominator forever just, denominator part adds 1.
PriiIt refers to negative word weight, if word i is the word of negative word modification when calculating emotion weight, needs to multiply
The weight of upper negative word, generally negative are defaulted as -1 if do not included in the weight dictionary of negative word.ValueiRefer to
It is the emotion weight of word itself, derives from emotion weight dictionary.
The SPARQL query statement carried using ontology, can directly carry out word matching for each sentence, to find
Its dimension and emotional category and weight in the body, then emotion weight can be calculated with above-mentioned formula.
Using the true Chinese comment taken about great thatch liquid medicine is climbed from microblogging, it is shown that specific step is as follows:
1. crawling great thatch liquid medicine relevant microblog using crawler, carries out pre-processing, segmented using stammerer, then
Remove stop words.
2. constructing great thatch liquid medicine sheet according to contents such as consumer evaluation's index, great thatch liquid medicine official document and microblogging comments
Body.Fig. 2 is the great thatch liquid medicine body part diagram of building.
3. after building original body using prot é g é, using Jena packet by ontology translation to database.
4. after the completion of ontology translation to database, carrying out ontology expansion.Ontology expansion is according to method from ontology dimension, emotion
It is carried out in terms of vocabulary another two.
5. selecting representative microblogging from microblog data carries out artificial emotion standard, finally obtain 1000 front evaluations and
1000 unfavorable ratings.
6. then formula of the invention carries out the evaluation of emotion weight to the microblogging picked out, its feeling polarities is determined.
7. being equally labeled to microblog emotional using traditional SVM classifier and Naive Bayes Classifier, standard is utilized
True rate, recall rate and F value carry out evaluation comparison.Fig. 3 illustrates application method of the present invention and conventional machines learning method SVM and Piao
The comparative situation of plain Bayes, wherein horizontal axis represents the distinct methods of experiment, and the longitudinal axis represents numerical value.It is found by comparing, this hair
It is bright more accurately more meticulously to analyze product microblog emotional tendency.
Present embodiments provide it is a kind of towards Chinese microblogging, based on the interdependent emotion combined of emotional noumenon and syntax point
Analysis method.Specifically includes the following steps: carrying out micro-blog information acquisition using crawler for the theme to be analyzed, carried out after acquisition
Then data cleansing and dimensionality reduction carry out semi-automatic building original body using the relevant micro-blog information of theme and official document, so
Microblog data is utilized afterwards, and automation updates ontology in terms of product dimension and emotion vocabulary two, to obtain mature ontology.Again
It borrows the information that ontology carries and calculates the emotion weight of micro-blog information, to reach the mesh of the emotion tendency of analysis microblog data
's.It finally uses rate of precision, recall rate and F value as evaluation criterion, is compared with conventional machines learning classification algorithm, this hair
It is bright that there is feasibility and superiority on Chinese microblog data collection.
Example of the invention is explained in detail above in conjunction with embodiment, but the present invention is not limited to examples detailed above,
Within the knowledge of a person skilled in the art, it can also make without departing from the purpose of the present invention
Various change also should be regarded as protection scope of the present invention.
Claims (5)
1. the microblog emotional analytic approach of a kind of ontology and the interdependent combination of syntax, which comprises the following steps:
Step (1): the relevant ontology of semi-automatic building theme, and ontology is persisted to database;
Step (2): ontology is expanded and is updated in terms of ontology dimension and emotion vocabulary two using syntax dependence;
Step (3): emotion weight computing is carried out to micro-blog information using ontology, determines Sentiment orientation.
2. the microblog emotional analytic approach of ontology according to claim 1 and the interdependent combination of syntax, which is characterized in that described
Step (1) specifically:
Step (1.1): ontology is constructed using seven footwork conventional construction methods by Prot é g é software: clearly belonging to building ontology
Field scope;A possibility that considering multiplexing ontology;Display field important terms;Define class and its hierarchical system;Define the category of class
Property;The facet of defined attribute;Create example;
Step (1.2): using Jena packet by ontology translation at database, data are extracted from semantic level, and be translated into mould
The acquisition source of type data is database or file.
3. the microblog emotional analytic approach of ontology according to claim 2 and the interdependent combination of syntax, which is characterized in that step
(1.2) process of the conversion in is as follows:
1. installing necessary software and configuring exploitation environment Eclipse+MySQL Server5.5-win32+jena2.6.4
+ protege5.1.0+mysql-connector-java-5.1.35 (JDBC of MySQL);
2. building product ontology with protege5.1.0, and actively generate OWL ontology file;
3. creating a database using MySQL;
4. opening Eclipse, a Java engineering is created;
5. while new construction, being directed respectively into Jena packet and the JDBC of MySQL;
6. creating a java class, name military_ontology.java under engineering catalogue;
7. starting to write code in military_ontology.java and run;
8. being successfully database by ontology translation;
7 tables can be generated after converting original body successfully using Jena, jena_g1t1_stmt is the table for storing body contents.
4. the microblog emotional analytic approach of ontology according to claim 3 and the interdependent combination of syntax, which is characterized in that described
Step (2) specifically:
Step (2.1): be extended by ontology dimension of the syntax dependency parsing technology to product ontology: there are predicates in sentence
Verb is as the center for dominating other ingredients, and predicate verb itself is not dominated by other ingredients, subject ingredient with certain according to
The relationship of depositing is subordinated to dominator, and depositing syntactic structure is using dependence as essential element, i.e., word is to binary crelation group, in binary
In relationship, dominator is known as core word, and subordinate is known as interdependent word, using Stanford Parser syntax dependency parsing device into
Row syntactic analysis:
Stanford Parser selects syntactic relation with typing dependence to be extended, and closes when extending dimension
Note includes two relational expressions of keyword, i.e. what nn and assmod, nn were indicated is noun combining form, and assmod indicates that association is repaired
Decorations, the dependence based on two noun phrases;For the subordinate's dimension newly obtained, it may be found that good relationship is stored in ontology data
Steps are as follows in library: it is class that new dimension type is arranged first, is then classified as the subclass of corresponding father's dimension;
Step (2.2): as follows for the extended mode of emotion vocabulary: Stanford Parser to be utilized, in syntax dependency analysis
On the basis of, expand emotion vocabulary and pay close attention to other two relational expression, i.e. amod and nsubj, amod indicates adjective modifier, i.e., often
Adjective before the noun seen, nsubj indicates nominal subject, for indicating the connection between subject and object;Emotion vocabulary category
In example, the step of being inserted into ontology database is as follows: it is NamedIndividual that its type is arranged first, then according to description
Classification is classified as the emotion vocabulary of the category, its emotion weight is finally inserted into database, emotion weight is from sentiment dictionary
It obtains.
5. the microblog emotional analytic approach of ontology according to claim 4 and the interdependent combination of syntax, which is characterized in that described
Step (3) specifically:
Talking about the calculation formula that emotion weight uses to every is:
Wherein n is the emotion word number in short including, PriiNegative word weight is referred to, if the word i when calculating emotion weight
It is the word of negative word modification, then the weight for being multiplied by negative word, generally negative is needed, if do not wrapped in the weight dictionary of negative word
Contain, is then defaulted as -1;ValueiIt refers to the emotion weight of word itself, derives from emotion weight dictionary;
DimeniThe weight of dimension, calculation formula are as follows where indicating i-th of word:
Dimeni=Perclass_i*Perwords_i
Wherein Perclass_iThe quantitative proportion that subordinate's class of dimension where referring to i-th of word accounts in whole class, Perwords_i
The quantitative proportion that the emotion assessment word number of dimension where referring to i-th of word accounts in total evaluation word;
The SPARQL query language carried when inquiring dimension class and emotion word using ontology, using the interface of Jena packet from
It changes into the ontology of database and extracts data relevant with class, example using SPARQL sentence;
TIiRefer to the TF*IDF weight of word, calculation formula is as follows:
TfijRefer to the tf value of word, for indicating ratio that some word occurs in current document, wherein molecule indicates word ti
The number occurred in document j, denominator indicate the sum of all word numbers of document j;IdfijIt is word idf value, referred to as reverse file
Frequency, refer to total number of files mesh divided by the number of files comprising keyword, then take that logarithm obtains as a result, wherein molecule indicates
Total number of files, denominator indicate to include word tiThe sum of number of files, in order to guarantee denominator forever just, denominator part adds 1;
The SPARQL query statement carried using ontology, directly carries out word matching for each sentence, to find it in ontology
In dimension and emotional category and weight, then emotion weight can be calculated with above-mentioned formula.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910276686.XA CN110020436A (en) | 2019-04-08 | 2019-04-08 | A kind of microblog emotional analytic approach of ontology and the interdependent combination of syntax |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910276686.XA CN110020436A (en) | 2019-04-08 | 2019-04-08 | A kind of microblog emotional analytic approach of ontology and the interdependent combination of syntax |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110020436A true CN110020436A (en) | 2019-07-16 |
Family
ID=67190687
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910276686.XA Pending CN110020436A (en) | 2019-04-08 | 2019-04-08 | A kind of microblog emotional analytic approach of ontology and the interdependent combination of syntax |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110020436A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113407725A (en) * | 2020-03-17 | 2021-09-17 | 复旦大学 | Method for constructing body model of regulation based on knowledge graph |
CN113434682A (en) * | 2021-06-30 | 2021-09-24 | 平安科技(深圳)有限公司 | Text emotion analysis method, electronic device and storage medium |
CN113836286A (en) * | 2021-09-26 | 2021-12-24 | 南开大学 | Community solitary old man emotion analysis method and system based on question-answer matching |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150278195A1 (en) * | 2014-03-31 | 2015-10-01 | Abbyy Infopoisk Llc | Text data sentiment analysis method |
CN109284499A (en) * | 2018-08-01 | 2019-01-29 | 数据地平线(广州)科技有限公司 | A kind of industry text emotion acquisition methods, device and storage medium |
-
2019
- 2019-04-08 CN CN201910276686.XA patent/CN110020436A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150278195A1 (en) * | 2014-03-31 | 2015-10-01 | Abbyy Infopoisk Llc | Text data sentiment analysis method |
CN109284499A (en) * | 2018-08-01 | 2019-01-29 | 数据地平线(广州)科技有限公司 | A kind of industry text emotion acquisition methods, device and storage medium |
Non-Patent Citations (7)
Title |
---|
IANCHARMING: "将OWL本体存储到MySQL数据库", 《HTTPS://BLOG.CSDN.NET/IANCHARMING/ARTICLE/DETAILS/50151359》 * |
PRNTSCR_: "Jena中SPARQL查询本体的简单实现", 《HTTPS://BLOG.CSDN.NET/PRNTSCR__/ARTICLE/DETAILS/52202295》 * |
唐晓波 等: "基于特征本体的微博产品评论情感分析", 《图书情报工作》 * |
夏梦南 等: "基于依存分析与特征组合的微博情感分析", 《山东大学学报(理学版)》 * |
文能: "基于领域本体和CRFS的商品评论倾向性分析", 《中国优秀硕士学位论文全文数据库, I143-27》 * |
赏月斋: "词频、逆向文件频率", 《HTTPS://BAIKE.BAIDU.COM/ITEM/TF-IDF/8816134?FR=ALADDIN》 * |
韦航: "面向目标的中文微博情感分析研究", 《中国优秀硕士学位论文全文数据库,I138-4945》 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113407725A (en) * | 2020-03-17 | 2021-09-17 | 复旦大学 | Method for constructing body model of regulation based on knowledge graph |
CN113407725B (en) * | 2020-03-17 | 2022-03-18 | 复旦大学 | Method for constructing body model of regulation based on knowledge graph |
CN113434682A (en) * | 2021-06-30 | 2021-09-24 | 平安科技(深圳)有限公司 | Text emotion analysis method, electronic device and storage medium |
CN113836286A (en) * | 2021-09-26 | 2021-12-24 | 南开大学 | Community solitary old man emotion analysis method and system based on question-answer matching |
CN113836286B (en) * | 2021-09-26 | 2024-04-05 | 南开大学 | Community orphan older emotion analysis method and system based on question-answer matching |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Xia et al. | Dual sentiment analysis: Considering two sides of one review | |
US8346795B2 (en) | System and method for guiding entity-based searching | |
US8977953B1 (en) | Customizing information by combining pair of annotations from at least two different documents | |
US10496756B2 (en) | Sentence creation system | |
Mohamed et al. | A hybrid approach for paraphrase identification based on knowledge-enriched semantic heuristics | |
WO2008046104A2 (en) | Methods and systems for knowledge discovery | |
JP5754019B2 (en) | Synonym extraction system, method and program | |
Saloot et al. | An architecture for Malay Tweet normalization | |
Ristoski et al. | Large-scale relation extraction from web documents and knowledge graphs with human-in-the-loop | |
He et al. | Question answering over linked data using first-order logic | |
CN110020436A (en) | A kind of microblog emotional analytic approach of ontology and the interdependent combination of syntax | |
TWI735380B (en) | Natural language processing method and computing apparatus thereof | |
CN111428031B (en) | Graph model filtering method integrating shallow semantic information | |
Gleim et al. | A practitioner’s view: a survey and comparison of lemmatization and morphological tagging in German and Latin | |
Jia et al. | A Chinese unknown word recognition method for micro-blog short text based on improved FP-growth | |
RU2563148C2 (en) | System and method for semantic search | |
Das Dawn et al. | A comprehensive review of Bengali word sense disambiguation | |
Rajput | Ontology based semantic annotation of Urdu language web documents | |
Zhang | Start small, build complete: Effective and efficient semantic table interpretation using tableminer | |
Sharma et al. | Shallow neural network and ontology-based novel semantic document indexing for information retrieval | |
Gupta et al. | Document summarisation based on sentence ranking using vector space model | |
Brauer et al. | RankIE: document retrieval on ranked entity graphs | |
RU2618375C2 (en) | Expanding of information search possibility | |
Çelebi et al. | Cluster-based mention typing for named entity disambiguation | |
JP5740743B2 (en) | Requirements document analysis system, method and program |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20190716 |
|
WD01 | Invention patent application deemed withdrawn after publication |