CN109408804A - The analysis of public opinion method, system, equipment and storage medium - Google Patents
The analysis of public opinion method, system, equipment and storage medium Download PDFInfo
- Publication number
- CN109408804A CN109408804A CN201811020177.2A CN201811020177A CN109408804A CN 109408804 A CN109408804 A CN 109408804A CN 201811020177 A CN201811020177 A CN 201811020177A CN 109408804 A CN109408804 A CN 109408804A
- Authority
- CN
- China
- Prior art keywords
- text
- environmentally friendly
- event
- environmental protection
- time
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Machine Translation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of the analysis of public opinion method, system, computer equipment and storage mediums; the method comprise the steps that step S1; enterprise, which is obtained, based on internet corresponds to environment protection field public feelings information; construct the environmentally friendly knowledge base of Environment Oriented protection, including green-body library, environmentally friendly factbase, environmentally friendly event base and environmentally friendly rule base;Step S2 carries out partial structurtes extraction and classification to text, to identify the environmental protection event related text with potential risk from mass text in conjunction with environmentally friendly knowledge base;Step S3, the environmentally friendly knowledge base based on step S1 building carry out the cluster of structural data to the text that step S2 is identified, and decide whether to carry out real-time early warning according to whether the included text number of each cluster is more than given threshold value.The above method improves efficiency, reduces human cost, shortens the time cycle.
Description
Technical field
The present invention relates to information technology field more particularly to a kind of the analysis of public opinion method, system, computer equipment and storages
Medium.
Background technique
Nowadays China is in During Period of Social Transform, rapid economic development, and urbanization, industrialized level are continuously improved,
The contradiction of economic development at the same time and environmental protection to each other is also more and more sharp.On the other hand, the awareness of the obligations of citizens constantly awaken,
Environmental consciousness is continuously improved, so that environmental problem obtains unprecedented attention.It can to effective monitoring of certain relevant enterprises
The extension of some environmental problems is prevented, and is directed to associated environmentally friendly public sentiment event, it can be according to relevant monitoring number
According to putting forward effective solutions.
" the environmentally friendly public opinion event " refers to the public opinion situation for certain environmental protection event entire society, concern state,
Effective analysis to such event is the premise for calming down public opinion.And " the analysis of public opinion ", also known as semantic analysis, it is a kind of pair of information
Content makees the ad hoc approach of the quantitative analysis of objective system, and the purpose is to understand fully or test the essential fact in information and become
Gesture, the recessive information content contained by prompt information, and information prediction is done to the development of event.At present to environmental protection of enterprise class carriage
The analysis of facts part mainly passes through manual research, collects relevant information, analyzes again after being arranged, this method low efficiency, at
This height, whole process time cycle are long.
Summary of the invention
Based on this, it is necessary to the drawbacks of for existing environmentally friendly the analysis of public opinion method, provide a kind of the analysis of public opinion method, be
System, equipment and storage medium.
A kind of the analysis of public opinion method, the analysis of public opinion method include: step S1, obtain enterprise based on internet and correspond to ring
Guarantor field public feelings information, the environmentally friendly knowledge base of building Environment Oriented protection, including green-body library, environmentally friendly factbase, ring
Protect event base and environmentally friendly rule base;Step S2 carries out partial structurtes extraction to text and divides in conjunction with the environmentally friendly knowledge base
Class, to identify the environmental protection event related text with potential risk from mass text;Step S3 is based on the environmental protection
Knowledge base carries out the cluster of structural data, and root to the environmental protection event related text with potential risk identified
Decide whether to carry out real-time early warning according to whether the included text number of each cluster is more than given threshold value.
The step Sl includes: step Sl0l in one of the embodiments, constructs the green-body library, the ring
The stratification organizational form for housing environmentally friendly concept in ontology library is protected, and there is equivalence relation and possible pass between concept
System's constraint;Step S102 constructs the environmentally friendly factbase, houses in the true environmentally friendly factbase of environmental protection by semanteme
Disambiguate and entity unique identification obtained from structured set;Step S103 constructs the environmentally friendly event base, including correlation
Vocabulary, these vocabulary are made of object, behavior, agent, word denoting the receiver of an action, time, place and reason;Step S104 constructs the environmental protection
Rule base houses the equivalence relation between concept and its probability of establishment.
The step S2 includes: step S201 in one of the embodiments, is located in advance sentence by sentence to text to be analyzed
Chinese is carried out participle and part-of-speech tagging, and special sequence of terms is merged and corrected by reason;Step S202, based on step
Entity is carried out concept mapping based on the stratification concept space in the green-body library by the sequence of terms that rapid S201 is obtained,
And concept disambiguation is carried out to ambiguity word simultaneously;Step S203, the sequence of terms after the disambiguation obtained based on step S202, according to
Sequence of terms after disambiguation is carried out information extraction by the basic clause of Chinese, converts structuring expression-form for text sentence;Step
Rapid S204 obtains the deep layer of current sentence in conjunction with the environmentally friendly knowledge base based on the structuring expression-form that step S203 is obtained
Semantic expressiveness, and for classifying, if classification results are unrelated with environmental protection event and do not scan to the last sentence of the text, then return
Otherwise step S201 analyzes next text.
The step S3 includes: step S301 in one of the embodiments, is loaded into the environmental protection event text identified
This set carries out structuring parsing to it using information extraction technique, does not consider when and where information at this time, and it is every to obtain description
The structured set of text topic;Step S302 is identified and is extracted in conjunction with the when and where word in the environmentally friendly event base
The when and where information of every text, and obtain describing the time arrow and place vector of every text;Step S303, will
Structured set is projected to the environmentally friendly knowledge base, and the structured features that filtering environmental protects event unrelated obtain every text
Candidate structure feature set;Step S304, by calculating, information selection of the structured features in different texts is therein to be had
Imitate character subset;Step S305 constructs all structured features of observation text, passes through the phase calculated between structured features
Like degree, while obtaining the feature vector of every text topic of description;Step S306, based on the feature obtained in step S305 to
Amount carries out topic cluster and obtains topic category set;Step S307 constructs observation text in conjunction with the environmentally friendly event base
All when and where features, respectively carry out when and where reasoning, be every text build time feature vector and place
Feature vector;Step S308 is carried out time and location cluster and is obtained time and location based on the feature vector obtained in step S307
Category set;Topic category set is merged with time and location category set, and obtains final environmental protection by step S309
The category set of event;Step S310 according to the included text number sequence early warning degree of each cluster, and will be more than given threshold
The environmental protection event of value carries out real-time early warning.
The step S204 includes: in one of the embodiments,
Step S20401, according to step S203 information extraction obtain as a result, the characteristics of for environmental protection event text,
Combining environmental protects the environmentally friendly knowledge base of event, carries out extensive knowledge, feature extraction and characteristic value to text and calculates;Step
S20402 marks training set two disaggregated models of training, root using having according to the Deep Semantics character representation that step S20401 is obtained
Real-time grading, final output recognition result are carried out according to disaggregated model.
A kind of the analysis of public opinion system based on blowdown data, the analysis of public opinion system based on blowdown data includes: to obtain
Modulus block corresponds to environment protection field public feelings information, the environmentally friendly knowledge of building Environment Oriented protection for obtaining enterprise based on internet
Library, including green-body library, environmentally friendly factbase, environmentally friendly event base and environmentally friendly rule base;Categorization module, in conjunction with the environmental protection
Knowledge base carries out partial structurtes extraction and classification to text, to identify the environmental protection with potential risk from mass text
Event related text;Warning module, the tool for being identified based on the environmentally friendly knowledge base for obtaining module building to categorization module
There is the environmental protection event related text of potential risk to carry out the cluster of structural data, and the text included according to each cluster
Whether this number is more than given threshold value to decide whether to carry out real-time early warning.
In one embodiment, the acquisition module further include: building body unit, for constructing the green-body
Library houses the stratification organizational form of environmentally friendly concept in the green-body library, and have between concept equivalence relation with
And possible relation constraint;True unit is constructed, for constructing the environmentally friendly factbase, the true environmental protection of environmental protection is true
It is housed in library by structured set obtained from semanteme disambiguation and entity unique identification;Event elements are constructed, are used for
The environmentally friendly event base, including relative words are constructed, these vocabulary are by object, behavior, agent, word denoting the receiver of an action, time, place and reason
Composition;Rules unit is constructed, for constructing the environmentally friendly rule base, houses the general of equivalence relation between concept and its establishment
Rate;
The categorization module further include: participle unit, for being pre-processed sentence by sentence to text to be analyzed, by Chinese into
Row participle and part-of-speech tagging, and special sequence of terms is merged and corrected;Unit is disambiguated, for based at participle unit
Entity is carried out concept mapping based on the stratification concept space in the green-body library by the sequence of terms obtained after reason, and
Concept disambiguation is carried out to ambiguity word simultaneously;Conversion unit, for based on the word after the disambiguation obtained after disambiguation cell processing
Sequence of terms after disambiguation is carried out information extraction according to the basic clause of Chinese, converts structuring table for text sentence by sequence
Up to form;Taxon, the structuring expression-form for being obtained after being handled based on conversion unit, in conjunction with the environmentally friendly knowledge base
The Deep Semantics for obtaining current sentence indicate, and for classifying, as classification results are unrelated with environmental protection event and do not scan to
The last sentence of the text then returns to participle unit processing, otherwise analyzes next text;
The warning module further include: resolution unit is utilized for being loaded into the environmental protection event text set identified
Information extraction technique carries out structuring parsing to it, does not consider when and where information at this time, obtains describing every text topic
Structured set;Recognition unit, for identifying and extracting every provision in conjunction with the when and where word in the environmentally friendly event base
This when and where information, and obtain describing the time arrow and place vector of every text;Unit is filtered out, for that will tie
Structure set is projected to the environmentally friendly knowledge base, and the structured features that filtering environmental protects event unrelated obtain every text
Candidate structure feature set;Selection unit, for therein by calculating information selection of the structured features in different texts
Validity feature subset;Obtaining unit, for constructing all structured features of observation text, by calculate structured features it
Between similarity, while obtain description every text topic feature vector;Topic cluster cell, for based at obtaining unit
The feature vector obtained in reason carries out topic cluster and obtains topic category set;Construction unit, in conjunction with the environmentally friendly thing
Part library constructs all when and where features of observation text, carries out when and where reasoning respectively, is every text construction
Temporal characteristics vector sum Site characterization vector;Time and location cluster cell, the feature for being obtained in being handled based on construction unit
Vector carries out time and location cluster and obtains time and location category set;Integrated unit was used for topic category set and time
Place category set is merged, and obtains the category set of final environmental protection event;Prewarning unit, for according to each poly-
The text number sequence early warning degree that class is included, and will be more than the environmental protection event progress real-time early warning of given threshold value.
The taxon in one of the embodiments, further include: subelement is extracted, for handling according to conversion unit
When information extraction obtain as a result, the characteristics of for environmental protection event text, combining environmental protects the environmental protection of event
Knowledge base carries out extensive knowledge, feature extraction and characteristic value to text and calculates;Classification subelement, for according to extraction subelement
The Deep Semantics character representation of acquisition is divided using there is mark training set two disaggregated models of training according to disaggregated model in real time
Class, final output recognition result.
A kind of computer equipment, including memory and processor are stored with computer-readable instruction in the memory, institute
When stating computer-readable instruction and being executed by the processor, so that the step of processor executes above-mentioned analysis method.
A kind of storage medium being stored with computer-readable instruction, the computer-readable instruction are handled by one or more
When device executes, so that the step of one or more processors execute above-mentioned analysis method.
Above-mentioned the analysis of public opinion method, system, computer equipment and storage medium, it is corresponding by obtaining enterprise based on internet
Environment protection field public feelings information constructs green-body library, the stratification tissue shape of environmentally friendly concept is housed in the green-body library
Formula, and have equivalence relation and possible relation constraint between concept, constructs environmentally friendly factbase, described in the environmental protection fact
It is housed in environmentally friendly factbase by structured set obtained from semanteme disambiguation and entity unique identification, constructs environmentally friendly thing
Part library, including relative words, these vocabulary are made of object, behavior, agent, word denoting the receiver of an action, time, place and reason, building environmental protection
Rule base houses the equivalence relation between concept and its probability of establishment, in conjunction with environmentally friendly knowledge base, to text to be analyzed by
Sentence is pre-processed, and Chinese is carried out participle and part-of-speech tagging, and special sequence of terms is merged and corrected, based on
Entity is carried out concept mapping based on the stratification concept space in the green-body library by the sequence of terms arrived, and right simultaneously
Ambiguity word carries out concept disambiguation, based on the sequence of terms after obtained disambiguation, according to the basic clause of Chinese by the word after disambiguation
Word order column carry out information extraction, convert structuring expression-form for text sentence, based on obtained structuring expression-form, knot
Closing the Deep Semantics that the environmentally friendly knowledge base obtains current sentence indicates, and for classifying, such as classification results and environmental protection thing
Part is unrelated and does not scan to the last sentence of the text and then returns to previous step, next text is otherwise analyzed, based on described in building
Environmentally friendly knowledge base carries out the cluster of structural data, and the text number included according to each cluster to the text identified
Whether it is more than given threshold value to decide whether to carry out real-time early warning, improves efficiency, reduce human cost, shorten week time
Phase.
Detailed description of the invention
By reading the following detailed description of the preferred embodiment, various other advantages and benefits are common for this field
Technical staff will become clear.The drawings are only for the purpose of illustrating a preferred embodiment, and is not considered as to the present invention
Limitation.
Fig. 1 is the flow chart of the analysis of public opinion method in one embodiment;
Fig. 2 is the flow chart for the environmentally friendly knowledge base that one embodiment constructs Environment Oriented protection;
Fig. 3 is the flow chart that one embodiment carries out partial structurtes extraction and classification to text;
Fig. 4 is the flow chart for carrying out the cluster and Realtime Alerts of structural data in one embodiment to text;
Fig. 5 is the flow chart for obtaining the Deep Semantics of current sentence in one embodiment and indicating to classify;
Fig. 6 is the structural block diagram of the analysis of public opinion system based on blowdown data in one embodiment;
Fig. 7 is the structural block diagram that module is obtained in one embodiment;
Fig. 8 is the structural block diagram of categorization module in one embodiment;
Fig. 9 is the structural block diagram of warning module in one embodiment;
Figure 10 is the structural block diagram of taxon in one embodiment.
Specific embodiment
In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to the accompanying drawings and embodiments, right
The present invention is described in detail.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, it is not used to
Limit the present invention.
As a preferable embodiment, as shown in Figure 1, a kind of the analysis of public opinion method, which includes: step
Rapid S1 obtains enterprise based on internet and corresponds to environment protection field public feelings information, constructs the environmentally friendly knowledge base of Environment Oriented protection, including
Green-body library, environmentally friendly factbase, environmentally friendly event base and environmentally friendly rule base;
Environmental protection of enterprise public sentiment event is in the majority with manual research, which low efficiency, at high cost, and the train of thought of the event and
The analysis of various correlations cannot be excavated rapidly, and the technical program is by collecting public feelings information regarding to the issue above, to this
After a little public feelings informations carry out text analyzing, affiliated enterprise's blowdown monitoring point, to the progress of public sentiment problem whether occur to the enterprise
Verifying.The method not only rapidly can provide solution for the environmentally friendly public sentiment event of certain enterprises, and can also provide reference
Data are conducive to analysis event essence.Enterprise, which is obtained, based on internet corresponds to environment protection field public feelings information, building Environment Oriented protection
Environmentally friendly knowledge base, including green-body library, environmentally friendly factbase, environmentally friendly event base and environmentally friendly rule base.Knowledge base is special needle
The knowledge base that the environmental protections such as pollutant discharge of enterprise event early warning problem is constructed comprising green-body library, environmentally friendly factbase, ring
Event base and environmentally friendly rule base are protected, the equivalence relation in green-body library between concept depends on Baidupedia, Wiki hundred
Section, interaction encyclopaedia and the various synonym tables published, the building of environmentally friendly factbase is depended on to be obtained from internet
Blowdown contamination accident correlation corpus, and take full advantage of various information extraction techniques, including Chinese word segmentation, part-of-speech tagging,
The operation such as dependency analysis and the identification of special clause, environmentally friendly event base includes environment protection field relative words, these vocabulary are by right
As, behavior, agent, word denoting the receiver of an action, time, place and reason composition, environmentally friendly rule base storage be equivalence relation between concept and
Its probability set up, the acquisition of these background knowledges are obtained from corpus automatically in such a way that machine learning adds pattern match
, the knowledge maintenance and update that can be automated.
Step S2 carries out partial structurtes extraction and classification to text, to know from mass text in conjunction with environmentally friendly knowledge base
The environmental protection event related text of potential risk is not provided;
Text to be analyzed is pre-processed sentence by sentence, Chinese is subjected to participle and part-of-speech tagging, and to special word
Sequence is merged and is corrected, based on obtained sequence of terms, by entity based on the stratification concept space in green-body library
Concept mapping is carried out, and concept disambiguation is carried out to ambiguity word simultaneously, based on the sequence of terms after obtained disambiguation, according to Chinese
Sequence of terms after disambiguation is carried out information extraction by basic clause, converts structuring expression-form for text sentence, based on
The structuring expression-form arrived is indicated in conjunction with the Deep Semantics that environmentally friendly knowledge base obtains current sentence, and for classifying, is such as classified
As a result unrelated with environmental protection event and do not scan to the last sentence of the text, then it returns and text to be analyzed is located in advance sentence by sentence
Chinese is carried out participle and part-of-speech tagging, and special sequence of terms is merged and corrected by reason, otherwise analyzes next provision
This.
Step S3, the environmentally friendly knowledge base based on step S1 building carry out structural data to the text that step S2 is identified
Cluster, and decide whether to carry out real-time early warning according to whether the included text number of each cluster is more than given threshold value.
It is loaded into the environmental protection event text set identified, structuring parsing is carried out to it using information extraction technique,
Do not consider when and where information at this time, the structured set for describing every text topic is obtained, in conjunction in environmentally friendly event base
When and where word, identifies and extracts the when and where information of every text, and obtain describing time of every text to
Amount and place vector, structured set is projected to environmentally friendly knowledge base, and the structured features that filtering environmental protects event unrelated obtain
To the candidate structure feature set of every text, by calculating, information selection of the structured features in different texts is therein to be had
Character subset is imitated, all structured features of observation text are constructed, by calculating the similarity between structured features, simultaneously
The feature vector of every text topic of description is obtained, the feature vector based on acquisition carries out topic cluster and obtains topic classification
Set constructs all when and where features of observation text in conjunction with environmentally friendly event base, carries out when and where respectively and pushes away
Reason, is every text build time feature vector and Site characterization vector, and it is poly- to carry out time and location for the feature vector based on acquisition
Class simultaneously obtains time and location category set, and topic category set is merged with time and location category set, and obtains final
The category set of environmental protection event according to the included text number sequence early warning degree of each cluster, and will be more than given threshold
The environmental protection event of value carries out real-time early warning.
As shown in Fig. 2, in one embodiment, the step Sl includes:
Step Sl0l constructs green-body library, the stratification tissue shape of environmentally friendly concept is housed in the green-body library
Formula, and there is equivalence relation and possible relation constraint between concept;
Green-body library is constructed, the stratification organizational form of environmentally friendly concept is housed in the green-body library, and general
There is equivalence relation and possible relation constraint, in conjunction with known hyponymy, open classification, polysemant and same between thought
Adopted word information carries out the horizontal and vertical fusion of concept hierarchy, on the other hand combines the example with attribute information, utilizes conclusion
Decision-tree model carries out the automatic identification of entity stratification concept, then forms the stratification institutional framework and reality of field concept
The mapping relations of example-concept, the equivalence relation in green-body library between concept depend on Baidupedia, wikipedia,
Interaction encyclopaedia and the various synonym tables published.
Step S102 constructs environmentally friendly factbase, houses in the true environmentally friendly factbase of environmental protection and disappear by semanteme
Structured set obtained from discrimination and entity unique identification;
The building of environmentally friendly factbase depends on the blowdown contamination accident correlation corpus obtained from internet, and fills
Divide and various information extraction techniques are utilized, including the behaviour such as Chinese word segmentation, part-of-speech tagging, dependency analysis and the identification of special clause
Make.The blowdown contamination accident correlation corpus obtained from internet, and various information extraction techniques are taken full advantage of, including in
The operations such as literary participle, part-of-speech tagging, dependency analysis and the identification of special clause will after obtaining a large amount of structuring group
Stratification concept of the structuring group therein into ontology library is mapped, if the group has more than one concept,
Semantic disambiguation is carried out according to the relationship and other information organized where it, to obtain a host of facts that there is uniqueness concept to identify
Structured set.
Step S103, constructs environmentally friendly event base, including relative words, these vocabulary by object, behavior, agent, word denoting the receiver of an action, when
Between, place and reason composition;
Environmentally friendly event base includes environment protection field relative words, these vocabulary by object, behavior, agent, word denoting the receiver of an action, the time,
Point and reason composition, house all kinds of time words and its numeric coding in library, the purpose of coding be by recognition time word and
The exact time is identified on the basis of the issuing time of text.In addition, the library can also include time-piece, the world is housed in table
The time zone of upper every country.For example, the Tokyo time is different from Beijing time, they differ 1 time zone, and the morning and meaning in afternoon are not
It is a time, and midnight and morning are then likely to be a time, these knowledge need knowledge base to provide.To these times into
The calculating of row similarity degree needs knowledge base to tell that computer morning, midnight are how many which period and one day etc. hour
Common sense.The effect of place ending word is to aid in the unrecognized place word of identification segmentation methods and determines the upper lower layer in place
Grade.Place is described generally according to sequence from big to small, and such phenomenon is level constraint, such as Shanghai Fengxian District.Therefore, by
In the place word that participle mistake can not be identified correctly, will constrain property according to level in the present invention is identified, the process
It will be related to the merger of multiple words.Further for example, for a certain multi-layer place, " Changle of Fujian Province city Hunan-Town Hydro-electric ", word segmentation result
For " Fujian Province/Changle city the ns/lake the ns/south the a/town n/n ", segmentation methods can not correctly identify " Hunan-Town Hydro-electric ", at this time according to level
Constraint, can identify that the place is ended up with town, and the statement sequence in town should be after city, it may thus be appreciated that " lake/south the a/town n/η "
It should be one place, therefore, word segmentation result be updated to " Fujian Province/Changle city the ns/Hunan-Town Hydro-electric ns/ns ".
Step S104 constructs environmentally friendly rule base, houses the equivalence relation between concept and its probability of establishment.
What environmentally friendly rule base was stored is the probability of the equivalence relation and its establishment between concept, the acquisition of these background knowledges
It is obtained automatically from corpus by the way of machine learning plus pattern match, the knowledge maintenance and update that can be automated.Base
In above-mentioned environmentally friendly factbase and green-body library, probability graph model technology and first order logic, such as Markov Logic Network are utilized
It realizes the automatic study for not knowing rule, and obtains the logical expressions for adding rule shaped like weight, then filter out satisfaction and actually answer
With the high quality logical expressions of demand, for example, 0.80 sewage<s: blowdown><=>pollution<s: blowdown >, 0.90 toxic<s: row
Dirty > & pollution < 〇: environment><=>blowdown<s: blowdown >, wherein s indicates that concept " blowdown " serves as subject in the group, and 〇 indicates real
Body serves as object in tuple, and & indicates logical AND, <=> expression equivalence relation.
As shown in figure 3, in one embodiment, the step S2 includes:
Step S201 pre-processes text to be analyzed sentence by sentence, Chinese is carried out participle and part-of-speech tagging, and to spy
Different sequence of terms is merged and is corrected;
Text to be analyzed is pre-processed sentence by sentence, Chinese is subjected to participle and part-of-speech tagging, and to special word
Sequence is merged and is corrected, such as to example sentence text, the result of participle and part-of-speech tagging such as < today/t, the morning/t, and 10 points
Half/t ,/w, sewage/n, in/p, Hunan-Town Hydro-electric/ns, nearby/f, diffusion/V ,/w, pollution/v, region/n ,/wn, exposure/n >.
Step S202, it is based on the sequence of terms that step S201 is obtained, entity is general based on the stratification in green-body library
It reads space and carries out concept mapping, and concept disambiguation is carried out to ambiguity word simultaneously;
Training data is prepared first, then learns more disaggregated models using naive Bayesian principle, wherein class label pair
Stratification concept is answered, feature vector is made of neighbouring unambiguously word and its said concepts given under window, utilizes ambiguity reality
Special context locating for body carries out concept identification using the disaggregated model that training generates automatically.For example, pollution: blowdown, poverty alleviation, dress
It repairs, it is pollution: blowdown that concept, which disambiguates result,.
Step S203, the sequence of terms after the disambiguation obtained based on step S202, after being disambiguated according to the basic clause of Chinese
Sequence of terms carry out information extraction, convert structuring expression-form for text sentence;
Sequence of terms after disambiguation is carried out information pumping according to the basic clause of Chinese by the sequence of terms after obtained disambiguation
It takes, converts structuring expression-form, such as group: blowdown (s: Changning, p: Hunan-Town Hydro-electric, t: this morning ten for text sentence
Point), it pollutes (s: environment, 〇: hidden pipe+underground), wherein p indicates that location component, t indicate that temporal information, "+" indicate arranged side by side
Relationship, i.e. " hidden pipe " and " underground " each act as the object of predicate pollution.
Step S204 obtains current sentence in conjunction with environmentally friendly knowledge base based on the structuring expression-form that step S203 is obtained
Deep Semantics indicate, and for classifying, if classification results are unrelated with environmental protection event and do not scan to the last sentence of the text,
Otherwise then return step S201 analyzes next text.
Based on obtained structuring expression-form, indicated in conjunction with the Deep Semantics that environmentally friendly knowledge base obtains current sentence, and
For classifying, if classification results are unrelated with environmental protection event and do not scan to the last sentence of the text, then return step S201, no
Then analyze next text, according to information extraction obtain as a result, the characteristics of for environmental protection event text, combining environmental is protected
The environmentally friendly knowledge base of shield event carries out extensive knowledge, feature extraction and characteristic value to text and calculates, according to the Deep Semantics of acquisition
Character representation carries out real-time grading, final output identification according to disaggregated model using there is mark training set two disaggregated models of training
As a result.
As shown in figure 4, in one embodiment, the step S3 includes:
Step S301 is loaded into the environmental protection event text set identified, is tied using information extraction technique to it
Structure neutralizing analysis, does not consider when and where information at this time, obtains the structured set for describing every text topic;
It is loaded into the environmental protection event text set identified, structuring parsing is carried out to it using information extraction technique,
Do not consider when and where information at this time, obtains the structured set for describing every text topic, structuring parsing includes to text
This carries out participle and structuring extraction operation, and the feature finally parsed will be stored in unified data structure.
Step S302 identifies and extracts the time and ground of every text in conjunction with the when and where word in environmentally friendly event base
Point information, and obtain describing the time arrow and place vector of every text;
The publication of text information and decimation in time, the time that the text occurs, mark and time library based on word algorithm,
Time word extraction is carried out to each text, wherein identifying by the way of pattern match to complicated time word, is based on time library,
Numerical value decoding operate is carried out to each time word identified, is time section associated by determining time word, time grain
The information such as bottom on degree, time are the word in place for word algorithm tag, inquire place library, identify the upper and lower of the place
Position and place level;When encountering the unrecognized new place word of word algorithm, identify ground by matching place mark words
Point word boundary, if place hyponymy it is known that if can confirm current location according to the descending statement sequence in place
The correctness of word identification.The level of place word is sorted out, if multiple places are extracted from a text, according to intersite
Hyponymy correctly sorts out them, and the same place is regarded in multiple places with hyponymy as.
Step S303 projects structured set to environmentally friendly knowledge base, and the structuring that filtering environmental protects event unrelated is special
Sign, obtains the candidate structure feature set of every text;
Structured set is projected, the structured features that filtering environmental protects event unrelated to environmentally friendly knowledge base, is obtained every
The candidate structure feature set of bar text, for every text, according to field event base, knot that filtering environmental protects event unrelated
Structure feature.
Step S304 chooses validity feature subset therein by calculating information of the structured features in different texts;
Validity feature subset therein is chosen by calculating information of the structured features in different texts, is reached in not shadow
In the case where ringing early warning effect, the dimension of feature is greatly reduced, reduces computation complexity.
Step S305 constructs all structured features of observation text, similar between structured features by calculating
Degree, while obtaining the feature vector of every text topic of description;
All structured features for constructing observation text are obtained simultaneously by calculating the similarity between structured features
The feature vector of every text topic must be described, initialisation structures characteristic set is sky, inputs the candidate structure of current text
Change feature, when feature vector is empty, a structured features be put into wherein, and feature vector is set as 0 in corresponding position,
Otherwise by its structured features and element in characteristic set one by one compared with, retain the most similar feature and similarity.
Step S306 is carried out topic cluster and is obtained topic category set based on the feature vector obtained in step S305;
The feature vector of acquisition, carry out topic cluster simultaneously obtain topic category set, text is polymerized to two classes, need when
Between and place matching in distinguish, in addition, cannot achieve matching in structured features in two comparison procedure of class one and class, because
The similitude of this two class is lower, gathers without being clustered process for a classification, when and where reasoning below can be effective
Solve this problem.
Step S307 constructs all when and where features of observation text, when carrying out respectively in conjunction with environmentally friendly event base
Between and place reasoning, be every text build time feature vector and Site characterization vector;
In conjunction with environmentally friendly event base, all when and where features of observation text are constructed, carry out when and where respectively
Reasoning is every text build time feature vector and Site characterization vector.Current time and Site characterization set are initialized, is enabled
It is respectively sky, for every text, distinguishes build time feature and Site characterization according to its time and location information, feature
Number depends on the quantity of different time and place, carries out time similarity reasoning and compares two that is, under regular hour window
Whether a time identical, includes in section, intersects or the process without intersection, when two time phase differences be no more than certain threshold value or
Two times, there is intersection then to think successful match, and feature vector is set as 1 in corresponding position, otherwise this feature be added current
In temporal characteristics set, feature vector is set as 1 in the position, remaining position be 0, the time a little, the stage, also have fuzzy expression
As in the recent period, common people are also difficult to very much accurately on indicating the time, and therefore, the time herein compares includes using in section
Mode, i.e. two time phase differences are no more than certain threshold value or two times, and there is intersection then to think successful match.Carry out place
Similarity mode inquires green-body library and place library, determines whether two places are identical, of equal value, to have father and son include to close
System, or after whether adding or abandoning place mark words in the end there is above-mentioned relation then to think successful match, by feature vector
It is set as 1 in corresponding position, otherwise this feature is added in current location feature vector, feature vector is set as 1 in the position, remaining
Position is 0.
Step S308 is carried out time and location cluster and is obtained time and location based on the feature vector obtained in step S307
Category set;
Feature vector based on acquisition carries out time and location cluster and simultaneously obtains time and location category set, according to the time and
Site characterization is finally polymerized to time and location cluster.
Topic category set is merged with time and location category set, and obtains final environmental protection by step S309
The category set of event;
Topic category set is merged with time and location category set, and obtains the classification of final environmental protection event
Set.Each classification is split, so that the text after splitting in each cluster also belongs to the same cluster.Based on as a result, to upper
Category set obtained in one step merges, so that the text of each cluster also belongs to the same cluster after merging, and wherein text
Similarity based on phrase feature is greater than given threshold value.
Step S310 according to the included text number sequence early warning degree of each cluster, and will be more than the ring of given threshold value
Event is protected to carry out real-time early warning in border.
It according to the included text number sequence early warning degree of each cluster, and will be more than the environmental protection event of given threshold value
Carry out timely early warning.
As shown in figure 5, in one embodiment, the step S204 includes:
Step S20401, according to step S203 information extraction obtain as a result, the characteristics of for environmental protection event text,
Combining environmental protects the environmentally friendly knowledge base of event, carries out extensive knowledge, feature extraction and characteristic value to text and calculates;
The equivalent entities set that current entity to be analyzed is obtained using green-body library, by the member in equivalent entities set
Element replaces entity to be analyzed one by one, participates in subsequent calculating.Such as " hidden pipe " in example sentence, it is got using green-body library
Equivalence set is seepage pit, and seepage well enters canal, can enter canal and replace hidden pipe respectively, participate in subsequent calculating seepage pit, seepage well.Relationship is extensive
The equivalence relation set for obtaining the relationship using environmentally friendly rule base for the relationship that is analysed to, by the element in equivalence relation set
One by one instead of entity to be analyzed, the relationship blowdown in subsequent arithmetic, such as example sentence is participated in, is got using environmentally friendly rule base
Equivalence relation collection is combined into discharge, shunts, and interflow can will discharge, and shunt, and interflow replaces blowdown respectively, participates in subsequent arithmetic.For
Following several category features are mainly extracted in the characteristics of environmental protection event, the invention, predicate, the predicate in group that information extraction obtains at
Point.In environmental protection event text, group predicate verb generally has very strong representativeness, such as " blowdown ", " undercurrent ", and here
" blowdown " and " undercurrent " has stronger environmental contamination.
Step S20402 marks training set training using having according to the Deep Semantics character representation that step S20401 is obtained
Two disaggregated models carry out real-time grading, final output recognition result according to disaggregated model.
According to the Deep Semantics character representation of acquisition, using there is mark training set two disaggregated models of training, according to classification mould
Type carries out real-time grading, final output recognition result, and tag along sort is referred to whether there is or not environmental protection event is related to, in real-time grading
In the process, by calculate disaggregated model value whether be more than given threshold value judge target text and Mass disturbance whether phase
It closes, two disaggregated models here can be any Supervised classification model in machine learning techniques, any real based on above-mentioned mechanism
Existing environmental protection event recognition method, should be included in the range of the invention.
As shown in fig. 6, in one embodiment, providing a kind of the analysis of public opinion system based on blowdown data, the base
Include: acquisition module in the analysis of public opinion system of blowdown data, corresponds to environment protection field public sentiment for obtaining enterprise based on internet
Information, the environmentally friendly knowledge base of building Environment Oriented protection, including green-body library, environmentally friendly factbase, environmentally friendly event base and environmental protection
Rule base;Categorization module carries out partial structurtes extraction and classification to text for combining environmentally friendly knowledge base, with literary from magnanimity
The environmental protection event related text with potential risk is identified in this;Warning module, for based on environmentally friendly knowledge base to point
The cluster for the environmental protection event related text progress structural data with potential risk that generic module identifies, and according to
Whether the included text number of each cluster is more than given threshold value to decide whether to carry out real-time early warning.
As shown in fig. 7, in one embodiment, the acquisition module further include: building body unit, it is described for constructing
Green-body library houses the stratification organizational form of environmentally friendly concept in the green-body library, and have between concept etc.
Valence relationship and possible relation constraint;True unit is constructed, for constructing the environmentally friendly factbase, described in the environmental protection fact
It is housed in environmentally friendly factbase by structured set obtained from semanteme disambiguation and entity unique identification;Building event list
Member, for constructing the environmentally friendly event base, including relative words, these vocabulary by object, behavior, agent, word denoting the receiver of an action, the time,
Point and reason composition;Construct rules unit, for constructing the environmentally friendly rule base, house equivalence relation between concept and its
The probability of establishment;
As shown in figure 8, the categorization module further include: participle unit, for being located in advance sentence by sentence to text to be analyzed
Chinese is carried out participle and part-of-speech tagging, and special sequence of terms is merged and corrected by reason;Unit is disambiguated, base is used for
The sequence of terms obtained after participle unit processing carries out entity based on the stratification concept space in the green-body library
Concept mapping, and concept disambiguation is carried out to ambiguity word simultaneously;Conversion unit, for being disappeared based on what is obtained after disambiguation cell processing
Sequence of terms after disambiguation is carried out information extraction according to the basic clause of Chinese, text sentence is converted by the sequence of terms after discrimination
For structuring expression-form;Taxon, the structuring expression-form for being obtained after being handled based on conversion unit, in conjunction with environmental protection
The Deep Semantics that knowledge base obtains current sentence indicate, and for classifying, as classification results are unrelated with environmental protection event and not
It scans to the last sentence of the text, then returns to participle unit and continue with, otherwise analyze next text;
As shown in figure 9, the warning module further include: resolution unit, for being loaded into the environmental protection event text identified
This set carries out structuring parsing to it using information extraction technique, does not consider when and where information at this time, and it is every to obtain description
The structured set of text topic;Recognition unit is identified and is extracted for combining the when and where word in environmentally friendly event base
The when and where information of every text, and obtain describing the time arrow and place vector of every text;Unit is filtered out, is used
In projecting structured set to environmentally friendly knowledge base, the structured features that filtering environmental protects event unrelated obtain every text
Candidate structure feature set;Selection unit, for being chosen wherein by calculating information of the structured features in different texts
Validity feature subset;Obtaining unit, for constructing all structured features of observation text, by calculating structured features
Between similarity, while obtain description every text topic feature vector;Topic cluster cell, for being based on obtaining unit
The feature vector obtained in processing carries out topic cluster and obtains topic category set;Construction unit, for combining environmentally friendly event
Library constructs all when and where features of observation text, carries out when and where reasoning respectively, when being every text construction
Between feature vector and Site characterization vector;Time and location cluster cell, feature for being obtained in being handled based on construction unit to
Amount carries out time and location cluster and obtains time and location category set;Integrated unit, for by topic category set and temporally
Point category set is merged, and obtains the category set of final environmental protection event;Prewarning unit, for according to each cluster
The text number sequence early warning degree for being included, and the environmental protection event progress real-time early warning that will be more than given threshold value.
As shown in Figure 10, in one embodiment, the taxon further include: subelement is extracted, for according to conversion
It is that information extraction when cell processing obtains as a result, the characteristics of being directed to environmental protection event text, combining environmental protect event
Environmentally friendly knowledge base carries out extensive knowledge, feature extraction and characteristic value to text and calculates;Classification subelement, for sub according to extracting
The Deep Semantics character representation that unit obtains is carried out real using there is mark training set two disaggregated models of training according to disaggregated model
When classify, final output recognition result.
In one embodiment it is proposed that a kind of computer equipment, the computer equipment includes memory and processor,
Computer-readable instruction is stored in memory, when computer-readable instruction is executed by processor, so that described in processor execution
Realize when computer program: step S1 obtains enterprise based on internet and corresponds to environment protection field public feelings information, and building Environment Oriented is protected
The environmentally friendly knowledge base of shield, including green-body library, environmentally friendly factbase, environmentally friendly event base and environmentally friendly rule base;Step S2, coupling collar
Knowledge base is protected, partial structurtes extraction and classification are carried out to text, to identify the ring with potential risk from mass text
Protect event related text in border;Step S3, the text that the environmentally friendly knowledge base based on step S1 building identifies step S2 into
The cluster of row structural data, and according to the included text number of each cluster whether be more than given threshold value decide whether into
Row real-time early warning.
In one embodiment, the step Sl includes:
Step Sl0l constructs the green-body library, the stratification group of environmentally friendly concept is housed in the green-body library
Form is knitted, and there is equivalence relation and possible relation constraint between concept;
Step S102 constructs the environmentally friendly factbase, houses in the true environmentally friendly factbase of environmental protection by language
Justice disambiguate and entity unique identification obtained from structured set;
Step S103, constructs the environmentally friendly event base, including relative words, these vocabulary by object, behavior, agent, by
Thing, time, place and reason composition;
Step S104 constructs the environmentally friendly rule base, houses the equivalence relation between concept and its probability of establishment.
The step S2 includes: in one of the embodiments,
Step S201 pre-processes text to be analyzed sentence by sentence, Chinese is carried out participle and part-of-speech tagging, and to spy
Different sequence of terms is merged and is corrected;
Step S202, it is based on the sequence of terms that step S201 is obtained, entity is general based on the stratification in green-body library
It reads space and carries out concept mapping, and concept disambiguation is carried out to ambiguity word simultaneously;
Step S203, the sequence of terms after the disambiguation obtained based on step S202, after being disambiguated according to the basic clause of Chinese
Sequence of terms carry out information extraction, convert structuring expression-form for text sentence;
Step S204 obtains current sentence in conjunction with environmentally friendly knowledge base based on the structuring expression-form that step S203 is obtained
Deep Semantics indicate, and for classifying, if classification results are unrelated with environmental protection event and do not scan to the last sentence of the text,
Otherwise then return step S201 analyzes next text.
In one embodiment, the step S3 includes:
Step S301 is loaded into the environmental protection event text set identified, is tied using information extraction technique to it
Structure neutralizing analysis, does not consider when and where information at this time, obtains the structured set for describing every text topic;
Step S302 identifies and extracts the time and ground of every text in conjunction with the when and where word in environmentally friendly event base
Point information, and obtain describing the time arrow and place vector of every text;
Step S303 projects structured set to environmentally friendly knowledge base, and the structuring that filtering environmental protects event unrelated is special
Sign, obtains the candidate structure feature set of every text;
Step S304 chooses validity feature subset therein by calculating information of the structured features in different texts;
Step S305 constructs all structured features of observation text, similar between structured features by calculating
Degree, while obtaining the feature vector of every text topic of description;
Step S306 is carried out topic cluster and is obtained topic category set based on the feature vector obtained in step S305;
Step S307 constructs all when and where features of observation text, when carrying out respectively in conjunction with environmentally friendly event base
Between and place reasoning, be every text build time feature vector and Site characterization vector;
Step S308 is carried out time and location cluster and is obtained time and location based on the feature vector obtained in step S307
Category set;
Topic category set is merged with time and location category set, and obtains final environmental protection by step S309
The category set of event;
Step S310 according to the included text number sequence early warning degree of each cluster, and will be more than the ring of given threshold value
Event is protected to carry out real-time early warning in border.
In one embodiment, the step S204 includes:
Step S20401, according to step S203 information extraction obtain as a result, the characteristics of for environmental protection event text,
Combining environmental protects the environmentally friendly knowledge base of event, carries out extensive knowledge, feature extraction and characteristic value to text and calculates;
Step S20402 marks training set training using having according to the Deep Semantics character representation that step S20401 is obtained
Two disaggregated models carry out real-time grading, final output recognition result according to disaggregated model.
In one embodiment it is proposed that a kind of storage medium for being stored with computer-readable instruction, this is computer-readable
When instruction is executed by one or more processors, so that one or more processors execute: step S1 obtains enterprise based on internet
Industry corresponds to environment protection field public feelings information, the environmentally friendly knowledge base of building Environment Oriented protection, including green-body library, the environmental protection fact
Library, environmentally friendly event base and environmentally friendly rule base;Step S2 carries out partial structurtes extraction to text and divides in conjunction with environmentally friendly knowledge base
Class, to identify the environmental protection event related text with potential risk from mass text;Step S3 is based on step S1 structure
The environmentally friendly knowledge base built carries out the cluster of structural data to the text that step S2 is identified, and is included according to each cluster
Text number whether be more than given threshold value to decide whether to carry out real-time early warning.
In one embodiment, the step Sl includes:
Step Sl0l constructs the green-body library, the stratification group of environmentally friendly concept is housed in the green-body library
Form is knitted, and there is equivalence relation and possible relation constraint between concept;
Step S102 constructs the environmentally friendly factbase, houses and disambiguates by semanteme and real in the environmental protection factbase
Structured set obtained from body unique identification;
Step S103, constructs the environmentally friendly event base, including relative words, these vocabulary by object, behavior, agent, by
Thing, time, place and reason composition;
Step S104 constructs the environmentally friendly rule base, houses the equivalence relation between concept and its probability of establishment.
The step S2 includes: in one of the embodiments,
Step S201 pre-processes text to be analyzed sentence by sentence, Chinese is carried out participle and part-of-speech tagging, and to spy
Different sequence of terms is merged and is corrected;
Step S202, based on the sequence of terms that step S201 is obtained, by entity based on the level in the green-body library
Change concept space and carry out concept mapping, and concept disambiguation is carried out to ambiguity word simultaneously;
Step S203, the sequence of terms after the disambiguation obtained based on step S202, after being disambiguated according to the basic clause of Chinese
Sequence of terms carry out information extraction, convert structuring expression-form for text sentence;
Step S204 obtains current sentence in conjunction with environmentally friendly knowledge base based on the structuring expression-form that step S203 is obtained
Deep Semantics indicate, and for classifying, if classification results are unrelated with environmental protection event and do not scan to the last sentence of the text,
Otherwise then return step S201 analyzes next text.
In one embodiment, the step S3 includes:
Step S301 is loaded into the environmental protection event text set identified, is tied using information extraction technique to it
Structure neutralizing analysis, does not consider when and where information at this time, obtains the structured set for describing every text topic;
Step S302 identifies and extracts the time and ground of every text in conjunction with the when and where word in environmentally friendly event base
Point information, and obtain describing the time arrow and place vector of every text;
Step S303 projects structured set to environmentally friendly knowledge base, and the structuring that filtering environmental protects event unrelated is special
Sign, obtains the candidate structure feature set of every text;
Step S304 chooses validity feature subset therein by calculating information of the structured features in different texts;
Step S305 constructs all structured features of observation text, similar between structured features by calculating
Degree, while obtaining the feature vector of every text topic of description;
Step S306 is carried out topic cluster and is obtained topic category set based on the feature vector obtained in step S305;
Step S307 constructs all when and where features of observation text, when carrying out respectively in conjunction with environmentally friendly event base
Between and place reasoning, be every text build time feature vector and Site characterization vector;
Step S308 is carried out time and location cluster and is obtained time and location based on the feature vector obtained in step S307
Category set;
Topic category set is merged with time and location category set, and obtains final environmental protection by step S309
The category set of event;
Step S310 according to the included text number sequence early warning degree of each cluster, and will be more than the ring of given threshold value
Event is protected to carry out real-time early warning in border.
In one embodiment, the step S204 includes:
Step S20401, according to step S203 information extraction obtain as a result, the characteristics of for environmental protection event text,
Combining environmental protects the environmentally friendly knowledge base of event, carries out extensive knowledge, feature extraction and characteristic value to text and calculates;Step
S20402 marks training set two disaggregated models of training, root using having according to the Deep Semantics character representation that step S20401 is obtained
Real-time grading, final output recognition result are carried out according to disaggregated model.
Those of ordinary skill in the art will appreciate that all or part of the steps in the various methods of above-described embodiment is can
It is completed with instructing relevant hardware by program, which can be stored in a computer readable storage medium, storage
Medium may include: read-only memory (ROM, Read Only Memory), random access memory (RAM, Random
Access Memory), disk or CD etc..
Each technical characteristic of embodiment described above can be combined arbitrarily, for simplicity of description, not to above-mentioned reality
It applies all possible combination of each technical characteristic in example to be all described, as long as however, the combination of these technical characteristics is not deposited
In contradiction, all should be considered as described in this specification.
Some exemplary embodiments of the invention above described embodiment only expresses, the description thereof is more specific and detailed, but
It cannot be construed as a limitation to the scope of the present invention.It should be pointed out that for the ordinary skill people of this field
For member, without departing from the inventive concept of the premise, various modifications and improvements can be made, these belong to of the invention
Protection scope.Therefore, the scope of protection of the patent of the invention shall be subject to the appended claims.
Claims (10)
1. a kind of the analysis of public opinion method characterized by comprising
Step S1 obtains enterprise based on internet and corresponds to environment protection field public feelings information, the environmentally friendly knowledge of building Environment Oriented protection
Library, including green-body library, environmentally friendly factbase, environmentally friendly event base and environmentally friendly rule base;
Step S2 carries out partial structurtes extraction and classification to text, to know from mass text in conjunction with the environmentally friendly knowledge base
The environmental protection event related text of potential risk is not provided;
Step S3, based on the environmentally friendly knowledge base to the related text of environmental protection event with potential risk described in identifying
The cluster of this progress structural data, and determine to be according to whether the included text number of each cluster is more than given threshold value
No carry out real-time early warning.
2. the analysis of public opinion method according to claim 1, which is characterized in that the step Sl includes:
Step Sl0l constructs the green-body library, and the stratification tissue shape of environmentally friendly concept is housed in the green-body library
Formula, and there is equivalence relation and possible relation constraint between concept;
Step S102 constructs the environmentally friendly factbase, housed in the environmental protection factbase disambiguated by semanteme and entity only
Structured set obtained from one property mark;
Step S103, constructs the environmentally friendly event base, including relative words, these vocabulary by object, behavior, agent, word denoting the receiver of an action, when
Between, place and reason composition;
Step S104 constructs the environmentally friendly rule base, houses the equivalence relation between concept and its probability of establishment.
3. the analysis of public opinion method according to claim 1, which is characterized in that the step S2 includes:
Step S201 pre-processes text to be analyzed sentence by sentence, Chinese is carried out participle and part-of-speech tagging, and to special
Sequence of terms is merged and is corrected;
Step S202, it is based on the sequence of terms that step S201 is obtained, entity is general based on the stratification in the green-body library
It reads space and carries out concept mapping, and concept disambiguation is carried out to ambiguity word simultaneously;
Step S203, the sequence of terms after the disambiguation obtained based on step S202, according to the basic clause of Chinese by the word after disambiguation
Word order column carry out information extraction, convert structuring expression-form for text sentence;
Step S204 obtains current sentence in conjunction with the environmentally friendly knowledge base based on the structuring expression-form that step S203 is obtained
Deep Semantics indicate, and for classifying, if classification results are unrelated with environmental protection event and do not scan to the last sentence of the text,
Otherwise then return step S201 analyzes next text.
4. the analysis of public opinion method according to claim 1, which is characterized in that the step S3 includes:
Step S301, is loaded into the environmental protection event text set identified, carries out structuring to it using information extraction technique
Parsing, does not consider when and where information at this time, obtains the structured set for describing every text topic;
Step S302 identifies and extracts the time and ground of every text in conjunction with the when and where word in the environmentally friendly event base
Point information, and obtain describing the time arrow and place vector of every text;
Step S303 projects structured set to the environmentally friendly knowledge base, and the structuring that filtering environmental protects event unrelated is special
Sign, obtains the candidate structure feature set of every text;
Step S304 chooses validity feature subset therein by calculating information of the structured features in different texts;
Step S305 constructs all structured features of observation text, by calculating the similarity between structured features, together
When obtain description every text topic feature vector;
Step S306 is carried out topic cluster and is obtained topic category set based on the feature vector obtained in step S305;
Step S307 constructs all when and where features of observation text, when carrying out respectively in conjunction with the environmentally friendly event base
Between and place reasoning, be every text build time feature vector and Site characterization vector;
Step S308 is carried out time and location cluster and is obtained time and location classification based on the feature vector obtained in step S307
Set;
Topic category set is merged with time and location category set, and obtains final environmental protection event by step S309
Category set;
Step S310 according to the included text number sequence early warning degree of each cluster, and will be more than the environment guarantor of given threshold value
Shield event carries out real-time early warning.
5. the analysis of public opinion method according to claim 2, which is characterized in that the step S204 includes:
Step S20401, according to step S203 information extraction obtain as a result, the characteristics of for environmental protection event text, in conjunction with
The environmentally friendly knowledge base of environmental protection event, carries out extensive knowledge, feature extraction and characteristic value to text and calculates;
Step S20402 marks two points of training set training using having according to the Deep Semantics character representation that step S20401 is obtained
Class model carries out real-time grading, final output recognition result according to disaggregated model.
6. a kind of the analysis of public opinion system based on blowdown data, which is characterized in that the analysis of public opinion system based on blowdown data
System includes: acquisition module, corresponds to environment protection field public feelings information for obtaining enterprise based on internet, building Environment Oriented protection
Environmentally friendly knowledge base, including green-body library, environmentally friendly factbase, environmentally friendly event base and environmentally friendly rule base;Categorization module, for combining
The environmental protection knowledge base, carries out partial structurtes extraction and classification to text, identified from mass text with potential hidden
The environmental protection event related text of trouble;Warning module, for the environmentally friendly knowledge base based on acquisition module building to categorization module
The environmental protection event related text with potential risk identified carries out the cluster of structural data, and according to each poly-
Whether the text number that class is included is more than given threshold value to decide whether to carry out real-time early warning.
7. the analysis of public opinion system according to claim 6 based on blowdown data, which is characterized in that the acquisition module is also
Include: building body unit, for constructing the green-body library, houses ring in green-body library described in the green-body
The stratification organizational form of concept is protected, and there is equivalence relation and possible relation constraint between concept;Building is true single
Member, for constructing the environmentally friendly factbase, housed in the true environmentally friendly factbase of environmental protection disambiguated by semanteme and
Structured set obtained from entity unique identification;Event elements are constructed, for constructing the environmentally friendly event base, including correlation
Vocabulary, these vocabulary are made of object, behavior, agent, word denoting the receiver of an action, time, place and reason;Rules unit is constructed, for constructing
The environmental protection rule base, houses the equivalence relation between concept and its probability of establishment;
The categorization module further include: participle unit is divided Chinese for pre-processing sentence by sentence to text to be analyzed
Word and part-of-speech tagging, and special sequence of terms is merged and corrected;Unit is disambiguated, after based on participle unit processing
Entity is carried out concept mapping based on the stratification concept space in the green-body library by obtained sequence of terms, and simultaneously
Concept disambiguation is carried out to ambiguity word;Conversion unit, for based on the sequence of terms disambiguated after cell processing after obtained disambiguation,
The sequence of terms after disambiguation is subjected to information extraction according to Chinese basic clause, converts structuring expression shape for text sentence
Formula;Taxon, the structuring expression-form for obtaining after being handled based on conversion unit are obtained in conjunction with the environmentally friendly knowledge base
The Deep Semantics of current sentence indicate, and are used to classify, as classification results are unrelated with environmental protection event and do not scan to this article
This last sentence then returns to participle unit processing, otherwise analyzes next text;
The warning module further include: resolution unit utilizes information for being loaded into the environmental protection event text set identified
Extraction technique carries out structuring parsing to it, does not consider when and where information at this time, obtains the knot for describing every text topic
Structure set;Recognition unit, for identifying and extracting every text in conjunction with the when and where word in the environmentally friendly event base
When and where information, and obtain describing the time arrow and place vector of every text;Unit is filtered out, is used for structuring
Gather and projected to the environmentally friendly knowledge base, the structured features that filtering environmental protects event unrelated obtain the candidate of every text
Structured features collection;Selection unit, for therein effectively by calculating information selection of the structured features in different texts
Character subset;Obtaining unit, for constructing all structured features of observation text, by calculating between structured features
Similarity, while obtaining the feature vector of every text topic of description;Topic cluster cell, for based in obtaining unit processing
The feature vector of acquisition carries out topic cluster and obtains topic category set;Construction unit, in conjunction with the environmentally friendly event
Library constructs all when and where features of observation text, carries out when and where reasoning respectively, when being every text construction
Between feature vector and Site characterization vector;Time and location cluster cell, feature for being obtained in being handled based on construction unit to
Amount carries out time and location cluster and obtains time and location category set;Integrated unit, for by topic category set and temporally
Point category set is merged, and obtains the category set of final environmental protection event;Prewarning unit, for according to each cluster
The text number sequence early warning degree for being included, and the environmental protection event progress real-time early warning that will be more than given threshold value.
8. the compositing system according to claim 7 based on artificial intelligence, which is characterized in that the taxon is also wrapped
It includes: extracting subelement, it is that information extraction when for being handled according to conversion unit obtains as a result, being directed to environmental protection event text
The characteristics of, combining environmental protects the environmentally friendly knowledge base of event, carries out extensive knowledge, feature extraction and characteristic value meter to text
It calculates;Classification subelement, the Deep Semantics character representation for obtaining according to subelement is extracted mark training set training two using having
Disaggregated model carries out real-time grading, final output recognition result according to disaggregated model.
9. a kind of computer equipment, which is characterized in that including memory and processor, being stored with computer in the memory can
Reading instruction, when the computer-readable instruction is executed by the processor, so that the processor executes such as claim 1 to 5
Any one of the method the step of.
10. a kind of storage medium for being stored with computer-readable instruction, which is characterized in that the computer-readable instruction is by one
Or multiple processors are when executing, so that one or more processors execute the step such as any one of claims 1 to 5 the method
Suddenly.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811020177.2A CN109408804A (en) | 2018-09-03 | 2018-09-03 | The analysis of public opinion method, system, equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811020177.2A CN109408804A (en) | 2018-09-03 | 2018-09-03 | The analysis of public opinion method, system, equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109408804A true CN109408804A (en) | 2019-03-01 |
Family
ID=65463909
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811020177.2A Withdrawn CN109408804A (en) | 2018-09-03 | 2018-09-03 | The analysis of public opinion method, system, equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109408804A (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110347793A (en) * | 2019-06-28 | 2019-10-18 | 北京牡丹电子集团有限责任公司宁安智慧工程中心 | A kind of semantic analysis method and device of Chinese |
CN110377696A (en) * | 2019-06-19 | 2019-10-25 | 新华智云科技有限公司 | A kind of commodity future news the analysis of public opinion method and system |
CN110825839A (en) * | 2019-11-07 | 2020-02-21 | 成都国腾实业集团有限公司 | Incidence relation analysis method for targets in text information |
CN110851598A (en) * | 2019-10-30 | 2020-02-28 | 深圳价值在线信息科技股份有限公司 | Text classification method and device, terminal equipment and storage medium |
CN111581982A (en) * | 2020-05-06 | 2020-08-25 | 首都师范大学 | Ontology-based prediction method for public opinion early warning grade of medical dispute case |
CN111914087A (en) * | 2020-07-30 | 2020-11-10 | 广州城市信息研究所有限公司 | Public opinion analysis method |
CN111984765A (en) * | 2019-05-21 | 2020-11-24 | 南京大学 | Knowledge base question-answering process relation detection method and device |
CN112100374A (en) * | 2020-08-28 | 2020-12-18 | 清华大学 | Text clustering method and device, electronic equipment and storage medium |
CN112749269A (en) * | 2019-10-31 | 2021-05-04 | 北京国双科技有限公司 | Entity public opinion calculation method and system |
CN112766506A (en) * | 2021-01-19 | 2021-05-07 | 澜途集思生态科技集团有限公司 | Knowledge base construction method based on architecture |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100312769A1 (en) * | 2009-06-09 | 2010-12-09 | Bailey Edward J | Methods, apparatus and software for analyzing the content of micro-blog messages |
CN103699663A (en) * | 2013-12-27 | 2014-04-02 | 中国科学院自动化研究所 | Hot event mining method based on large-scale knowledge base |
CN104091054A (en) * | 2014-06-26 | 2014-10-08 | 中国科学院自动化研究所 | Mass disturbance warning method and system applied to short texts |
-
2018
- 2018-09-03 CN CN201811020177.2A patent/CN109408804A/en not_active Withdrawn
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100312769A1 (en) * | 2009-06-09 | 2010-12-09 | Bailey Edward J | Methods, apparatus and software for analyzing the content of micro-blog messages |
CN103699663A (en) * | 2013-12-27 | 2014-04-02 | 中国科学院自动化研究所 | Hot event mining method based on large-scale knowledge base |
CN104091054A (en) * | 2014-06-26 | 2014-10-08 | 中国科学院自动化研究所 | Mass disturbance warning method and system applied to short texts |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111984765A (en) * | 2019-05-21 | 2020-11-24 | 南京大学 | Knowledge base question-answering process relation detection method and device |
CN111984765B (en) * | 2019-05-21 | 2023-10-24 | 南京大学 | Knowledge base question-answering process relation detection method and device |
CN110377696A (en) * | 2019-06-19 | 2019-10-25 | 新华智云科技有限公司 | A kind of commodity future news the analysis of public opinion method and system |
CN110347793A (en) * | 2019-06-28 | 2019-10-18 | 北京牡丹电子集团有限责任公司宁安智慧工程中心 | A kind of semantic analysis method and device of Chinese |
CN110851598A (en) * | 2019-10-30 | 2020-02-28 | 深圳价值在线信息科技股份有限公司 | Text classification method and device, terminal equipment and storage medium |
CN110851598B (en) * | 2019-10-30 | 2023-04-07 | 深圳价值在线信息科技股份有限公司 | Text classification method and device, terminal equipment and storage medium |
CN112749269A (en) * | 2019-10-31 | 2021-05-04 | 北京国双科技有限公司 | Entity public opinion calculation method and system |
CN110825839B (en) * | 2019-11-07 | 2023-07-21 | 成都国腾实业集团有限公司 | Association relation analysis method for targets in text information |
CN110825839A (en) * | 2019-11-07 | 2020-02-21 | 成都国腾实业集团有限公司 | Incidence relation analysis method for targets in text information |
CN111581982A (en) * | 2020-05-06 | 2020-08-25 | 首都师范大学 | Ontology-based prediction method for public opinion early warning grade of medical dispute case |
CN111581982B (en) * | 2020-05-06 | 2023-02-17 | 首都师范大学 | Ontology-based prediction method for public opinion early warning grade of medical dispute case |
CN111914087A (en) * | 2020-07-30 | 2020-11-10 | 广州城市信息研究所有限公司 | Public opinion analysis method |
CN111914087B (en) * | 2020-07-30 | 2023-09-19 | 广州城市信息研究所有限公司 | Public opinion analysis method |
CN112100374A (en) * | 2020-08-28 | 2020-12-18 | 清华大学 | Text clustering method and device, electronic equipment and storage medium |
CN112766506A (en) * | 2021-01-19 | 2021-05-07 | 澜途集思生态科技集团有限公司 | Knowledge base construction method based on architecture |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109408804A (en) | The analysis of public opinion method, system, equipment and storage medium | |
CN104091054B (en) | Towards the Mass disturbance method for early warning and system of short text | |
CN112199608B (en) | Social media rumor detection method based on network information propagation graph modeling | |
CN110888943B (en) | Method and system for assisted generation of court judge document based on micro-template | |
CN110781315B (en) | Food safety knowledge graph and construction method of related intelligent question-answering system | |
CN110968699A (en) | Logic map construction and early warning method and device based on event recommendation | |
CN113191148B (en) | Rail transit entity identification method based on semi-supervised learning and clustering | |
CN111369299B (en) | Identification method, device, equipment and computer readable storage medium | |
CN106294619A (en) | Public sentiment intelligent supervision method | |
CN111427775B (en) | Method level defect positioning method based on Bert model | |
CN113487211A (en) | Nuclear power equipment quality tracing method and system, computer equipment and medium | |
CN115757775A (en) | Text implication-based triggerless text event detection method and system | |
CN117743601B (en) | Natural resource knowledge graph completion method, device, equipment and medium | |
Lüscher et al. | Where is the terraced house? On the use of ontologies for recognition of urban concepts in cartographic databases | |
CN113761094B (en) | Construction method, system, device and storage medium of geological disaster logic map | |
CN114066077B (en) | Environmental sanitation risk prediction method based on emergency event space warning sign analysis | |
Thanos et al. | Combined deep learning and traditional NLP approaches for fire burst detection based on twitter posts | |
CN115391523A (en) | Wind power plant multi-source heterogeneous data processing method and device | |
Mu et al. | Construction of Knowledge Graph for Emergency Resources | |
Yu et al. | Information Security Field Event Detection Technology Based on SAtt‐LSTM | |
CN111797213A (en) | Method for mining financial risk clues from unstructured network information | |
CN111291198A (en) | Economic situation index analysis method and system based on big data and computer readable medium | |
Delavallade et al. | Monitoring event flows and modelling scenarios for crisis prediction: Application to ethnic conflicts forecasting | |
CN118503456B (en) | Method and system for generating space criticizing knowledge graph of ecological environment area | |
CN117422063B (en) | Big data processing method applying intelligent auxiliary decision and intelligent auxiliary decision system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20190301 |