CN110020436A - A kind of microblog emotional analytic approach of ontology and the interdependent combination of syntax - Google Patents

A kind of microblog emotional analytic approach of ontology and the interdependent combination of syntax Download PDF

Info

Publication number
CN110020436A
CN110020436A CN201910276686.XA CN201910276686A CN110020436A CN 110020436 A CN110020436 A CN 110020436A CN 201910276686 A CN201910276686 A CN 201910276686A CN 110020436 A CN110020436 A CN 110020436A
Authority
CN
China
Prior art keywords
ontology
word
emotion
dimension
weight
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910276686.XA
Other languages
Chinese (zh)
Inventor
朱群雄
罗敏
徐圆
贺彦林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Chemical Technology
Original Assignee
Beijing University of Chemical Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Chemical Technology filed Critical Beijing University of Chemical Technology
Priority to CN201910276686.XA priority Critical patent/CN110020436A/en
Publication of CN110020436A publication Critical patent/CN110020436A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Business, Economics & Management (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Resources & Organizations (AREA)
  • General Business, Economics & Management (AREA)
  • Tourism & Hospitality (AREA)
  • Strategic Management (AREA)
  • Primary Health Care (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Computing Systems (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses the microblog emotional analytic approach of a kind of ontology and the interdependent combination of syntax, comprising the following steps: the relevant ontology of semi-automatic building theme, and ontology is persisted to database;Ontology is expanded and updated in terms of ontology dimension and emotion vocabulary two using syntax dependence;Emotion weight computing is carried out to micro-blog information using ontology, determines Sentiment orientation.It is compared with conventional machines learning classification algorithm, the present invention has feasibility and superiority on Chinese microblog data collection.

Description

A kind of microblog emotional analytic approach of ontology and the interdependent combination of syntax
Technical field
The invention belongs to text emotion analysis technical field, in particular to the microblogging feelings of a kind of ontology and the interdependent combination of syntax Feel analytic approach.
Technical background
With popularizing for mobile Internet, microblogging is as social platform, on the basis of possessing a large number of users, has become The most fast informed source of hot news event.Since the viscosity of user is high, microblogging contains the daily information of netizen of magnanimity, wherein Including the in-service evaluation for each product.And because of some reasons, the evaluation data of product itself on-line shop are not objective enough, otherwise because For the routine of microblogging, user's evaluation is more objective, has more tap value.Therefore it for enterprise, obtains and uses from microblogging Family is to the evaluation of product and is subject to sentiment analysis, is the Information base of business decision indispensability.
For microblog data based on text data, the Sentiment orientation analysis for text data is the hot spot studied in recent years, It is broadly divided into machine learning and ontological analysis two ways.Classifier is based on artificial constructed more in method based on machine learning, When being directed to large data collection, modeling process is excessively complicated and tediously long, and manual operation is more difficult.To solve the above-mentioned problems, ontology Construction method be suggested.Ontology is a kind of formalization, for sharing the clear of concept system and being described in detail, its energy It is enough that concept is described from semantic level.It is above-mentioned based on the sentiment analysis of ontology after ontology initial construction, not will be updated this Body, it is excessively high to the accuracy requirement of initial construction during realization, facts proved that the dimension of ontology can with the expansion of data and Increase.
Summary of the invention
The invention proposes the interdependent microblog emotional analysis methods combined of a kind of ontology and syntax, it is therefore an objective to more accurately Related emotion information is obtained from microblogging.Its original body of building semi-automatic for micro-blog information, then according to relevant text Data automate update and optimization ontology using syntax dependency parsing principle in terms of product dimension, emotion vocabulary two, thus Obtain mature ontology.Mature ontology is borrowed again, using new emotion weight calculation method proposed by the present invention, measures text data Emotion weight and tendentiousness, to be accurately realized sentiment analysis.
Technical scheme is as follows:
A kind of microblog emotional analytic approach of ontology and the interdependent combination of syntax, comprising the following steps:
Step (1): the relevant ontology of semi-automatic building theme, and ontology is persisted to database;
Step (2): ontology is carried out in terms of ontology dimension and emotion vocabulary two using syntax dependence expand and It updates;
Step (3): emotion weight computing is carried out to micro-blog information using ontology, determines Sentiment orientation.
Further, the step (1) specifically:
Step (1.1): ontology is constructed using seven footwork conventional construction methods by Prot é g é software: clearly building ontology Fields scope;A possibility that considering multiplexing ontology;Display field important terms;Define class and its hierarchical system;Define class Attribute;The facet of defined attribute;Create example;
Step (1.2): using Jena packet by ontology translation at database, data are extracted from semantic level, and converted Acquisition source for model data is database or file.
Further, the process of the conversion in step (1.2) is as follows:
1. installing necessary software and configuring exploitation environment Eclipse+MySQL Server5.5-win32+ Jena2.6.4+protege5.1.0+mysql-connector-java-5.1.35 (JDBC of MySQL);
2. building product ontology with protege5.1.0, and actively generate OWL ontology file;
3. creating a database using MySQL;
4. opening Eclipse, a Java engineering is created;
5. while new construction, being directed respectively into Jena packet and the JDBC of MySQL;
6. creating a java class, name military_ontology.java under engineering catalogue;
7. starting to write code in military_ontology.java and run;
8. being successfully database by ontology translation;
7 tables can be generated after converting original body successfully using Jena, jena_g1t1_stmt is storage body contents Table.
Further, the step (2) specifically:
Step (2.1): it is extended by ontology dimension of the syntax dependency parsing technology to product ontology: being existed in sentence Predicate verb is as the center for dominating other ingredients, and predicate verb itself is not dominated by other ingredients, and subject ingredient is with certain Kind of dependence is subordinated to dominator, deposit syntactic structure be using dependence as essential element, i.e. word to binary crelation group, In binary crelation, dominator is known as core word, and subordinate is known as interdependent word, uses Stanford Parser syntax dependency parsing Device carries out syntactic analysis:
Stanford Parser selects syntactic relation with typing dependence to be extended, in extension dimension When concern include keyword two relational expressions, i.e. what nn and assmod, nn were indicated is noun combining form, and assmod indicates to close Connection modification, the dependence based on two noun phrases;For the subordinate's dimension newly obtained, it may be found that good relationship is stored in ontology Database steps are as follows: it is class that new dimension type is arranged first, is then classified as the subclass of corresponding father's dimension;
Step (2.2): as follows for the extended mode of emotion vocabulary: to utilize Stanford Parser, rely on and divide in syntax On the basis of analysis, expanding emotion vocabulary and pay close attention to other two relational expression, i.e. amod and nsubj, amod indicates adjective modifier, Adjective before i.e. common noun, nsubj indicates nominal subject, for indicating the connection between subject and object;Emotion word The step of remittance belongs to example, is inserted into ontology database is as follows: it is NamedIndividual that its type is arranged first, then according to Description classification is classified as the emotion vocabulary of the category, its emotion weight is finally inserted into database, and emotion weight is from emotion word It is obtained in allusion quotation.
Further, the step (3) specifically:
Talking about the calculation formula that emotion weight uses to every is:
Wherein n is the emotion word number in short including, PriiNegative word weight is referred to, if when calculating emotion weight Word i is the word of negative word modification, then the weight for being multiplied by negative word, generally negative is needed, if in the weight dictionary of negative word Do not include, is then defaulted as -1;ValueiIt refers to the emotion weight of word itself, derives from emotion weight dictionary;
DimeniThe weight of dimension, calculation formula are as follows where indicating i-th of word:
Dimeni=Perclass_i*Perwords_i
Wherein Perclass_iThe quantitative proportion that subordinate's class of dimension where referring to i-th of word accounts in whole class, Perwords_iThe quantitative proportion that the emotion assessment word number of dimension where referring to i-th of word accounts in total evaluation word;
In the SPARQL query language inquiring dimension class and being carried using ontology when emotion word, using Jena packet interface from It has changed into the ontology of database and has extracted data relevant with class, example using SPARQL sentence;
TIiRefer to the TF*IDF weight of word, calculation formula is as follows:
TfijRefer to the tf value of word, for indicating ratio that some word occurs in current document, wherein molecule indicates single Word tiThe number occurred in document j, denominator indicate the sum of all word numbers of document j;IdfijIt is word idf value, referred to as inversely Document-frequency, refer to total number of files mesh divided by the number of files comprising keyword, then take that logarithm obtains as a result, wherein molecule Indicate total number of files, denominator indicates to include word tiThe sum of number of files, in order to guarantee denominator forever just, denominator part adds 1;
Using ontology carry SPARQL query statement, word matching is directly carried out for each sentence, with find its Dimension and emotional category and weight in ontology, then emotion weight can be calculated with above-mentioned formula.
Detailed description of the invention
Fig. 1 is ontology and syntax is interdependent combines microblog emotional analysis flow chart diagram.
Fig. 2 is great thatch liquid medicine body part display diagram.
Fig. 3 is the calculated affection index of the present invention and SVM and Naive Bayes Classifier classifying quality comparison diagram.
Specific embodiment
To make those skilled in the art more fully understand technical solution of the present invention, below to one kind provided by the invention Ontology and the microblog emotional analytic approach of the interdependent combination of syntax are described in detail.Following embodiment be merely to illustrate the present invention rather than For limiting the scope of the invention.
Embodiment
A kind of ontology and the interdependent microblog emotional analysis method combined of syntax, comprising the following steps:
1, the pre-processing of microblog data
The microblog data crawled is needed to carry out pre-processing, is specifically included that
(1) unified Chinese and English punctuation mark, unified full-shape and DBC case;
(2) emoticon is converted directly into corresponding Chinese;
(3) redundancy about reply such as removal " reply: " " it is good to reply weather: " " it is good to reply@weather: ";
(4) remove additional character " | (|) | $ | Shu | " | " | △ | ▲ | ▼ | ▍ | ■ etc.;
(5) remove in addition to,.!Other punctuation marks of equal tables segmentation sentence;
(6) it is segmented using stammerer and carries out vocabulary segmentation;
(7) stop words is removed.
2, the creation and persistence of original body
(1) the semi-automatic creation of ontology
The present invention uses seven footwork conventional construction methods in building.Seven footworks are clear with respect to other methods step, logic Succinctly, easily operated.Seven footworks are developed by Stanford University Medical institute, are a kind of more common body constructing methods.It seven A step is respectively: clearly building ontology fields scope;A possibility that considering multiplexing ontology;Display field important terms; Define class and its hierarchical system;Define the attribute of class;The facet of defined attribute;Create example.
The present invention uses the semi-automatic building ontology of tool Prot é g é (Stanford University, 1999).Protégé Software is the ontology construction tool of Stanford University Medical institute biological information research center exploitation.It is write based on Java language, Belong to open-source software.Prot é g é is that user shields specific ontology description language, and user is not required to specifically learn ontology Write language, need to only be described using the shortcut of software offer.
(2) persistence of ontology
For the ontology of semi-automatic building, need constantly to expand and modify ontology in data processing, it is therefore necessary to By ontology persistence, change is easily processed, and the present invention is using Jena packet (HP Labs, 2009) by ontology translation at database. Jena is a Java Open Framework, is mainly used for extracting data from semantic level, and be translated into model.And data obtain The source of fetching can be database or file etc..If it is desired to inquiring data in semantic model, Jena also provides query language, i.e., SPARQL。
The present invention is using jena packet by ontology translation at database.The process of conversion is as follows:
1. installing necessary software and configuring exploitation environment Eclipse+MySQL Server5.5-win32+ Jena2.6.4+protege5.1.0+mysql-connector-java-5.1.35 (JDBC of MySQL);
2. building product ontology with protege5.1.0, and actively generate OWL ontology file;
3. creating a database using MySQL;
4. opening Eclipse, a Java engineering is created;
5. while new construction, being directed respectively into Jena packet and the JDBC of MySQL;
6. creating a java class, name military_ontology.java under engineering catalogue;
7. starting to write code in military_ontology.java and run;
8. being successfully database by ontology translation.
7 tables can be generated after converting original body successfully using Jena, ontology information is stored in jena_g1t0_reif In table, other tables are without concern.
3, the ontology expansion based on syntax dependence
The present invention realizes that product ontology extends automatically using the dependence based on Chinese syntax.Principal concern is user Syntax in comment removes the correspondence descriptor or junior's attribute of discovery product itself or product dimension using syntactic relation. Because application is product ontology, the evaluation index of product and product dimension need to be only paid close attention in automatic extension, without considering Uncorrelated vocabulary.The automatic extension of ontology mainly includes two aspects.
(1) each dimension of ontology is extended
The ontology dimension of product ontology is extended.Due to that cannot be completely secured when initially setting up ontology comprehensively, So needing gradually to extend ontology with the processing of data.The technology that extension ontology dimension is mainly used is syntax dependency parsing.
Syntax is interdependent to be proposed by French linguist Tesiniere in nineteen fifty-nine.The core concept of method is: depositing in sentence In predicate verb as the center for dominating other ingredients, and predicate verb itself is not dominated by other ingredients, subject ingredient with Certain dependence is subordinated to dominator.Dependency grammar structure is using dependence as essential element, i.e., word is to binary crelation Group.In binary crelation, dominator is known as core word, and subordinate is known as interdependent word.Dependence just reflect core word and according to Deposit the semantic dependency relationship between word.
The present invention carries out syntactic analysis using Stanford Parser (Stanford Univ-ersity, 2002). Stanford Parser is by the parser of Stanford University's natural language processing group development, is a height optimization Probability context-free grammar and Lexical dependency analysis device, principle is from probability statistics.There is the JAVA of open source real at present Existing software package can be used, multi-lingual including support English, Chinese, German.
Stanford Parser is defeated with the various ways such as parsing tree and typing dependence for syntactic relation Out, the present invention selects typing dependence to be extended selection.There are many dependences that software provides, and the present invention is expanding Two relational expressions comprising keyword, i.e. nn and assmod are primarily upon when opening up dimension.What nn was indicated is noun combining form, than As being " advertising cost " when getting one group of nn relational expression, then it would know that " cost " is subordinate's dimension of " advertisement ";Assmod table Show association modification, be mainly based upon the dependence of two noun phrases, for example when getting one group of assmod relational expression is " medicine Wine advertisement " then would know that " advertisement " is subordinate's dimension of " liquid medicine ".
For the subordinate's dimension newly obtained, it may be found that steps are as follows for good relationship deposit ontology database: setting is new first Dimension type is class, is then classified as the subclass of corresponding father's dimension.Table 1 illustrates new dimension extension and is inserted into data Library needs increased entry.
The extension of the new dimension of table 1 is inserted into database
(2) expand emotion vocabulary
Universal emotion vocabulary can be added when ontology initial construction, however when analyzing specific product, for difference Dimension needs different dimension emotion vocabulary, these words will be obtained from real data mostly, more difficult in early-stage preparations It collects more comprehensive.
Expand emotion vocabulary mode be it is identical as a upper section, utilize Stanford Parser.In syntax dependency analysis On the basis of, expand emotion vocabulary and focuses more on other two relational expression, i.e. amod and nsubj.Amod indicates adjective modifier, Adjective before i.e. common noun, such as amod relational expression are " sham publicity ", then would know that " falseness " is the emotion of " advertisement " Vocabulary;Nsubj indicates nominal subject, is mainly used for indicating that the connection between subject and object, such as nsubj relational expression are " statement is shameless ", then would know that " shamelessness " is the emotion vocabulary of " statement ".
The step of emotion vocabulary belongs to example, is inserted into ontology database is as follows: its type is arranged first is Then NamedIndividual is classified as the emotion vocabulary of the category according to description classification, is finally inserted into its emotion weight Database, emotion weight are obtained from sentiment dictionary.Table 2, which illustrates new dimension emotion vocabulary and is inserted into database, to be needed to increase Entry.
2 dimension emotion vocabulary of table is inserted into database
4, emotion weight computing
After ontology expansion updates completion, need to carry out sentiment analysis to data.Traditional sentiment analysis is mostly emotion power Value is directly added, and the method error is too big.In order to avoid such case, the present invention utilizes the emotion weight based on ontology dimension point Analysis method.This method takes into account the dimension index of emotion word in the body when calculating emotion weight, can be more fully React the effect of emotion word.
Before affection computation, early-stage preparations, the introducing of mainly a variety of dictionaries have been carried out.Include: emotion weight dictionary, Negative word dictionary and synonymicon.Emotion weight dictionary select Chinese Language Department, Tsinghua University sentiment dictionary, it includes be word The emotion weight of itself.Negative word dictionary selects the negative dictionary that uses of Jiangsu University of Science and Technology, negate dictionary effect be in order to Weight is negated, if there are negative word before word, that subsequent word weight should be turned.Synonymicon, which is selected, to be breathed out Work great society calculates and Research into information retrieval center Chinese thesaurus, is to use to expand ontology, the synonym of each word can To expand as similar dimension into ontology, in this way when searching or judging dimension with regard to more acurrate;Secondly, when in emotion dictionary It when weight not comprising some word, can use all synonym weights of the word, its average value taken to weigh as the emotion of the word Value.
Talking about the calculation formula that emotion weight uses to every is:
Wherein n is the emotion word number in short including, DimeniThe weight of dimension where indicating i-th of word, calculates Formula is as follows:
Dimeni=Perclass_i*Perwords_i
Wherein Perclass_iThe quantitative proportion that subordinate's class of dimension where referring to i-th of word accounts in whole class, Perwords_iThe quantitative proportion that the emotion assessment word number of dimension where referring to i-th of word accounts in total evaluation word.
The SPARQL query language that need to only ontology is used to carry when inquiring dimension class and emotion word, Jena packet provide Interface, can extract data relevant with class, example using SPARQL sentence from the ontology for changed into database.
TIiRefer to the TF*IDF weight of word, calculation formula is as follows:
TfijRefer to the tf value of word, for indicating ratio that some word occurs in current document, wherein molecule indicates single Word tiThe number occurred in document j, denominator indicate the sum of all word numbers of document j;IdfijIt is word idf value, referred to as inversely Document-frequency, refer to total number of files mesh divided by the number of files comprising keyword, then take that logarithm obtains as a result, wherein molecule Indicate total number of files, denominator indicates to include word tiThe sum of number of files.In order to guarantee denominator forever just, denominator part adds 1.
PriiIt refers to negative word weight, if word i is the word of negative word modification when calculating emotion weight, needs to multiply The weight of upper negative word, generally negative are defaulted as -1 if do not included in the weight dictionary of negative word.ValueiRefer to It is the emotion weight of word itself, derives from emotion weight dictionary.
The SPARQL query statement carried using ontology, can directly carry out word matching for each sentence, to find Its dimension and emotional category and weight in the body, then emotion weight can be calculated with above-mentioned formula.
Using the true Chinese comment taken about great thatch liquid medicine is climbed from microblogging, it is shown that specific step is as follows:
1. crawling great thatch liquid medicine relevant microblog using crawler, carries out pre-processing, segmented using stammerer, then Remove stop words.
2. constructing great thatch liquid medicine sheet according to contents such as consumer evaluation's index, great thatch liquid medicine official document and microblogging comments Body.Fig. 2 is the great thatch liquid medicine body part diagram of building.
3. after building original body using prot é g é, using Jena packet by ontology translation to database.
4. after the completion of ontology translation to database, carrying out ontology expansion.Ontology expansion is according to method from ontology dimension, emotion It is carried out in terms of vocabulary another two.
5. selecting representative microblogging from microblog data carries out artificial emotion standard, finally obtain 1000 front evaluations and 1000 unfavorable ratings.
6. then formula of the invention carries out the evaluation of emotion weight to the microblogging picked out, its feeling polarities is determined.
7. being equally labeled to microblog emotional using traditional SVM classifier and Naive Bayes Classifier, standard is utilized True rate, recall rate and F value carry out evaluation comparison.Fig. 3 illustrates application method of the present invention and conventional machines learning method SVM and Piao The comparative situation of plain Bayes, wherein horizontal axis represents the distinct methods of experiment, and the longitudinal axis represents numerical value.It is found by comparing, this hair It is bright more accurately more meticulously to analyze product microblog emotional tendency.
Present embodiments provide it is a kind of towards Chinese microblogging, based on the interdependent emotion combined of emotional noumenon and syntax point Analysis method.Specifically includes the following steps: carrying out micro-blog information acquisition using crawler for the theme to be analyzed, carried out after acquisition Then data cleansing and dimensionality reduction carry out semi-automatic building original body using the relevant micro-blog information of theme and official document, so Microblog data is utilized afterwards, and automation updates ontology in terms of product dimension and emotion vocabulary two, to obtain mature ontology.Again It borrows the information that ontology carries and calculates the emotion weight of micro-blog information, to reach the mesh of the emotion tendency of analysis microblog data 's.It finally uses rate of precision, recall rate and F value as evaluation criterion, is compared with conventional machines learning classification algorithm, this hair It is bright that there is feasibility and superiority on Chinese microblog data collection.
Example of the invention is explained in detail above in conjunction with embodiment, but the present invention is not limited to examples detailed above, Within the knowledge of a person skilled in the art, it can also make without departing from the purpose of the present invention Various change also should be regarded as protection scope of the present invention.

Claims (5)

1. the microblog emotional analytic approach of a kind of ontology and the interdependent combination of syntax, which comprises the following steps:
Step (1): the relevant ontology of semi-automatic building theme, and ontology is persisted to database;
Step (2): ontology is expanded and is updated in terms of ontology dimension and emotion vocabulary two using syntax dependence;
Step (3): emotion weight computing is carried out to micro-blog information using ontology, determines Sentiment orientation.
2. the microblog emotional analytic approach of ontology according to claim 1 and the interdependent combination of syntax, which is characterized in that described Step (1) specifically:
Step (1.1): ontology is constructed using seven footwork conventional construction methods by Prot é g é software: clearly belonging to building ontology Field scope;A possibility that considering multiplexing ontology;Display field important terms;Define class and its hierarchical system;Define the category of class Property;The facet of defined attribute;Create example;
Step (1.2): using Jena packet by ontology translation at database, data are extracted from semantic level, and be translated into mould The acquisition source of type data is database or file.
3. the microblog emotional analytic approach of ontology according to claim 2 and the interdependent combination of syntax, which is characterized in that step (1.2) process of the conversion in is as follows:
1. installing necessary software and configuring exploitation environment Eclipse+MySQL Server5.5-win32+jena2.6.4 + protege5.1.0+mysql-connector-java-5.1.35 (JDBC of MySQL);
2. building product ontology with protege5.1.0, and actively generate OWL ontology file;
3. creating a database using MySQL;
4. opening Eclipse, a Java engineering is created;
5. while new construction, being directed respectively into Jena packet and the JDBC of MySQL;
6. creating a java class, name military_ontology.java under engineering catalogue;
7. starting to write code in military_ontology.java and run;
8. being successfully database by ontology translation;
7 tables can be generated after converting original body successfully using Jena, jena_g1t1_stmt is the table for storing body contents.
4. the microblog emotional analytic approach of ontology according to claim 3 and the interdependent combination of syntax, which is characterized in that described Step (2) specifically:
Step (2.1): be extended by ontology dimension of the syntax dependency parsing technology to product ontology: there are predicates in sentence Verb is as the center for dominating other ingredients, and predicate verb itself is not dominated by other ingredients, subject ingredient with certain according to The relationship of depositing is subordinated to dominator, and depositing syntactic structure is using dependence as essential element, i.e., word is to binary crelation group, in binary In relationship, dominator is known as core word, and subordinate is known as interdependent word, using Stanford Parser syntax dependency parsing device into Row syntactic analysis:
Stanford Parser selects syntactic relation with typing dependence to be extended, and closes when extending dimension Note includes two relational expressions of keyword, i.e. what nn and assmod, nn were indicated is noun combining form, and assmod indicates that association is repaired Decorations, the dependence based on two noun phrases;For the subordinate's dimension newly obtained, it may be found that good relationship is stored in ontology data Steps are as follows in library: it is class that new dimension type is arranged first, is then classified as the subclass of corresponding father's dimension;
Step (2.2): as follows for the extended mode of emotion vocabulary: Stanford Parser to be utilized, in syntax dependency analysis On the basis of, expand emotion vocabulary and pay close attention to other two relational expression, i.e. amod and nsubj, amod indicates adjective modifier, i.e., often Adjective before the noun seen, nsubj indicates nominal subject, for indicating the connection between subject and object;Emotion vocabulary category In example, the step of being inserted into ontology database is as follows: it is NamedIndividual that its type is arranged first, then according to description Classification is classified as the emotion vocabulary of the category, its emotion weight is finally inserted into database, emotion weight is from sentiment dictionary It obtains.
5. the microblog emotional analytic approach of ontology according to claim 4 and the interdependent combination of syntax, which is characterized in that described Step (3) specifically:
Talking about the calculation formula that emotion weight uses to every is:
Wherein n is the emotion word number in short including, PriiNegative word weight is referred to, if the word i when calculating emotion weight It is the word of negative word modification, then the weight for being multiplied by negative word, generally negative is needed, if do not wrapped in the weight dictionary of negative word Contain, is then defaulted as -1;ValueiIt refers to the emotion weight of word itself, derives from emotion weight dictionary;
DimeniThe weight of dimension, calculation formula are as follows where indicating i-th of word:
Dimeni=Perclass_i*Perwords_i
Wherein Perclass_iThe quantitative proportion that subordinate's class of dimension where referring to i-th of word accounts in whole class, Perwords_i The quantitative proportion that the emotion assessment word number of dimension where referring to i-th of word accounts in total evaluation word;
The SPARQL query language carried when inquiring dimension class and emotion word using ontology, using the interface of Jena packet from It changes into the ontology of database and extracts data relevant with class, example using SPARQL sentence;
TIiRefer to the TF*IDF weight of word, calculation formula is as follows:
TfijRefer to the tf value of word, for indicating ratio that some word occurs in current document, wherein molecule indicates word ti The number occurred in document j, denominator indicate the sum of all word numbers of document j;IdfijIt is word idf value, referred to as reverse file Frequency, refer to total number of files mesh divided by the number of files comprising keyword, then take that logarithm obtains as a result, wherein molecule indicates Total number of files, denominator indicate to include word tiThe sum of number of files, in order to guarantee denominator forever just, denominator part adds 1;
The SPARQL query statement carried using ontology, directly carries out word matching for each sentence, to find it in ontology In dimension and emotional category and weight, then emotion weight can be calculated with above-mentioned formula.
CN201910276686.XA 2019-04-08 2019-04-08 A kind of microblog emotional analytic approach of ontology and the interdependent combination of syntax Pending CN110020436A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910276686.XA CN110020436A (en) 2019-04-08 2019-04-08 A kind of microblog emotional analytic approach of ontology and the interdependent combination of syntax

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910276686.XA CN110020436A (en) 2019-04-08 2019-04-08 A kind of microblog emotional analytic approach of ontology and the interdependent combination of syntax

Publications (1)

Publication Number Publication Date
CN110020436A true CN110020436A (en) 2019-07-16

Family

ID=67190687

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910276686.XA Pending CN110020436A (en) 2019-04-08 2019-04-08 A kind of microblog emotional analytic approach of ontology and the interdependent combination of syntax

Country Status (1)

Country Link
CN (1) CN110020436A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113407725A (en) * 2020-03-17 2021-09-17 复旦大学 Method for constructing body model of regulation based on knowledge graph
CN113434682A (en) * 2021-06-30 2021-09-24 平安科技(深圳)有限公司 Text emotion analysis method, electronic device and storage medium
CN113836286A (en) * 2021-09-26 2021-12-24 南开大学 Community solitary old man emotion analysis method and system based on question-answer matching

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150278195A1 (en) * 2014-03-31 2015-10-01 Abbyy Infopoisk Llc Text data sentiment analysis method
CN109284499A (en) * 2018-08-01 2019-01-29 数据地平线(广州)科技有限公司 A kind of industry text emotion acquisition methods, device and storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150278195A1 (en) * 2014-03-31 2015-10-01 Abbyy Infopoisk Llc Text data sentiment analysis method
CN109284499A (en) * 2018-08-01 2019-01-29 数据地平线(广州)科技有限公司 A kind of industry text emotion acquisition methods, device and storage medium

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
IANCHARMING: "将OWL本体存储到MySQL数据库", 《HTTPS://BLOG.CSDN.NET/IANCHARMING/ARTICLE/DETAILS/50151359》 *
PRNTSCR_: "Jena中SPARQL查询本体的简单实现", 《HTTPS://BLOG.CSDN.NET/PRNTSCR__/ARTICLE/DETAILS/52202295》 *
唐晓波 等: "基于特征本体的微博产品评论情感分析", 《图书情报工作》 *
夏梦南 等: "基于依存分析与特征组合的微博情感分析", 《山东大学学报(理学版)》 *
文能: "基于领域本体和CRFS的商品评论倾向性分析", 《中国优秀硕士学位论文全文数据库, I143-27》 *
赏月斋: "词频、逆向文件频率", 《HTTPS://BAIKE.BAIDU.COM/ITEM/TF-IDF/8816134?FR=ALADDIN》 *
韦航: "面向目标的中文微博情感分析研究", 《中国优秀硕士学位论文全文数据库,I138-4945》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113407725A (en) * 2020-03-17 2021-09-17 复旦大学 Method for constructing body model of regulation based on knowledge graph
CN113407725B (en) * 2020-03-17 2022-03-18 复旦大学 Method for constructing body model of regulation based on knowledge graph
CN113434682A (en) * 2021-06-30 2021-09-24 平安科技(深圳)有限公司 Text emotion analysis method, electronic device and storage medium
CN113836286A (en) * 2021-09-26 2021-12-24 南开大学 Community solitary old man emotion analysis method and system based on question-answer matching
CN113836286B (en) * 2021-09-26 2024-04-05 南开大学 Community orphan older emotion analysis method and system based on question-answer matching

Similar Documents

Publication Publication Date Title
Xia et al. Dual sentiment analysis: Considering two sides of one review
US8346795B2 (en) System and method for guiding entity-based searching
US8977953B1 (en) Customizing information by combining pair of annotations from at least two different documents
US10496756B2 (en) Sentence creation system
Mohamed et al. A hybrid approach for paraphrase identification based on knowledge-enriched semantic heuristics
WO2008046104A2 (en) Methods and systems for knowledge discovery
JP5754019B2 (en) Synonym extraction system, method and program
Saloot et al. An architecture for Malay Tweet normalization
Ristoski et al. Large-scale relation extraction from web documents and knowledge graphs with human-in-the-loop
He et al. Question answering over linked data using first-order logic
CN110020436A (en) A kind of microblog emotional analytic approach of ontology and the interdependent combination of syntax
TWI735380B (en) Natural language processing method and computing apparatus thereof
CN111428031B (en) Graph model filtering method integrating shallow semantic information
Gleim et al. A practitioner’s view: a survey and comparison of lemmatization and morphological tagging in German and Latin
Jia et al. A Chinese unknown word recognition method for micro-blog short text based on improved FP-growth
RU2563148C2 (en) System and method for semantic search
Das Dawn et al. A comprehensive review of Bengali word sense disambiguation
Rajput Ontology based semantic annotation of Urdu language web documents
Zhang Start small, build complete: Effective and efficient semantic table interpretation using tableminer
Sharma et al. Shallow neural network and ontology-based novel semantic document indexing for information retrieval
Gupta et al. Document summarisation based on sentence ranking using vector space model
Brauer et al. RankIE: document retrieval on ranked entity graphs
RU2618375C2 (en) Expanding of information search possibility
Çelebi et al. Cluster-based mention typing for named entity disambiguation
JP5740743B2 (en) Requirements document analysis system, method and program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20190716

WD01 Invention patent application deemed withdrawn after publication