CN109408804A - The analysis of public opinion method, system, equipment and storage medium - Google Patents

The analysis of public opinion method, system, equipment and storage medium Download PDF

Info

Publication number
CN109408804A
CN109408804A CN201811020177.2A CN201811020177A CN109408804A CN 109408804 A CN109408804 A CN 109408804A CN 201811020177 A CN201811020177 A CN 201811020177A CN 109408804 A CN109408804 A CN 109408804A
Authority
CN
China
Prior art keywords
text
environmentally friendly
event
environmental protection
time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201811020177.2A
Other languages
Chinese (zh)
Inventor
金戈
徐亮
肖京
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201811020177.2A priority Critical patent/CN109408804A/en
Publication of CN109408804A publication Critical patent/CN109408804A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of the analysis of public opinion method, system, computer equipment and storage mediums; the method comprise the steps that step S1; enterprise, which is obtained, based on internet corresponds to environment protection field public feelings information; construct the environmentally friendly knowledge base of Environment Oriented protection, including green-body library, environmentally friendly factbase, environmentally friendly event base and environmentally friendly rule base;Step S2 carries out partial structurtes extraction and classification to text, to identify the environmental protection event related text with potential risk from mass text in conjunction with environmentally friendly knowledge base;Step S3, the environmentally friendly knowledge base based on step S1 building carry out the cluster of structural data to the text that step S2 is identified, and decide whether to carry out real-time early warning according to whether the included text number of each cluster is more than given threshold value.The above method improves efficiency, reduces human cost, shortens the time cycle.

Description

The analysis of public opinion method, system, equipment and storage medium
Technical field
The present invention relates to information technology field more particularly to a kind of the analysis of public opinion method, system, computer equipment and storages Medium.
Background technique
Nowadays China is in During Period of Social Transform, rapid economic development, and urbanization, industrialized level are continuously improved, The contradiction of economic development at the same time and environmental protection to each other is also more and more sharp.On the other hand, the awareness of the obligations of citizens constantly awaken, Environmental consciousness is continuously improved, so that environmental problem obtains unprecedented attention.It can to effective monitoring of certain relevant enterprises The extension of some environmental problems is prevented, and is directed to associated environmentally friendly public sentiment event, it can be according to relevant monitoring number According to putting forward effective solutions.
" the environmentally friendly public opinion event " refers to the public opinion situation for certain environmental protection event entire society, concern state, Effective analysis to such event is the premise for calming down public opinion.And " the analysis of public opinion ", also known as semantic analysis, it is a kind of pair of information Content makees the ad hoc approach of the quantitative analysis of objective system, and the purpose is to understand fully or test the essential fact in information and become Gesture, the recessive information content contained by prompt information, and information prediction is done to the development of event.At present to environmental protection of enterprise class carriage The analysis of facts part mainly passes through manual research, collects relevant information, analyzes again after being arranged, this method low efficiency, at This height, whole process time cycle are long.
Summary of the invention
Based on this, it is necessary to the drawbacks of for existing environmentally friendly the analysis of public opinion method, provide a kind of the analysis of public opinion method, be System, equipment and storage medium.
A kind of the analysis of public opinion method, the analysis of public opinion method include: step S1, obtain enterprise based on internet and correspond to ring Guarantor field public feelings information, the environmentally friendly knowledge base of building Environment Oriented protection, including green-body library, environmentally friendly factbase, ring Protect event base and environmentally friendly rule base;Step S2 carries out partial structurtes extraction to text and divides in conjunction with the environmentally friendly knowledge base Class, to identify the environmental protection event related text with potential risk from mass text;Step S3 is based on the environmental protection Knowledge base carries out the cluster of structural data, and root to the environmental protection event related text with potential risk identified Decide whether to carry out real-time early warning according to whether the included text number of each cluster is more than given threshold value.
The step Sl includes: step Sl0l in one of the embodiments, constructs the green-body library, the ring The stratification organizational form for housing environmentally friendly concept in ontology library is protected, and there is equivalence relation and possible pass between concept System's constraint;Step S102 constructs the environmentally friendly factbase, houses in the true environmentally friendly factbase of environmental protection by semanteme Disambiguate and entity unique identification obtained from structured set;Step S103 constructs the environmentally friendly event base, including correlation Vocabulary, these vocabulary are made of object, behavior, agent, word denoting the receiver of an action, time, place and reason;Step S104 constructs the environmental protection Rule base houses the equivalence relation between concept and its probability of establishment.
The step S2 includes: step S201 in one of the embodiments, is located in advance sentence by sentence to text to be analyzed Chinese is carried out participle and part-of-speech tagging, and special sequence of terms is merged and corrected by reason;Step S202, based on step Entity is carried out concept mapping based on the stratification concept space in the green-body library by the sequence of terms that rapid S201 is obtained, And concept disambiguation is carried out to ambiguity word simultaneously;Step S203, the sequence of terms after the disambiguation obtained based on step S202, according to Sequence of terms after disambiguation is carried out information extraction by the basic clause of Chinese, converts structuring expression-form for text sentence;Step Rapid S204 obtains the deep layer of current sentence in conjunction with the environmentally friendly knowledge base based on the structuring expression-form that step S203 is obtained Semantic expressiveness, and for classifying, if classification results are unrelated with environmental protection event and do not scan to the last sentence of the text, then return Otherwise step S201 analyzes next text.
The step S3 includes: step S301 in one of the embodiments, is loaded into the environmental protection event text identified This set carries out structuring parsing to it using information extraction technique, does not consider when and where information at this time, and it is every to obtain description The structured set of text topic;Step S302 is identified and is extracted in conjunction with the when and where word in the environmentally friendly event base The when and where information of every text, and obtain describing the time arrow and place vector of every text;Step S303, will Structured set is projected to the environmentally friendly knowledge base, and the structured features that filtering environmental protects event unrelated obtain every text Candidate structure feature set;Step S304, by calculating, information selection of the structured features in different texts is therein to be had Imitate character subset;Step S305 constructs all structured features of observation text, passes through the phase calculated between structured features Like degree, while obtaining the feature vector of every text topic of description;Step S306, based on the feature obtained in step S305 to Amount carries out topic cluster and obtains topic category set;Step S307 constructs observation text in conjunction with the environmentally friendly event base All when and where features, respectively carry out when and where reasoning, be every text build time feature vector and place Feature vector;Step S308 is carried out time and location cluster and is obtained time and location based on the feature vector obtained in step S307 Category set;Topic category set is merged with time and location category set, and obtains final environmental protection by step S309 The category set of event;Step S310 according to the included text number sequence early warning degree of each cluster, and will be more than given threshold The environmental protection event of value carries out real-time early warning.
The step S204 includes: in one of the embodiments,
Step S20401, according to step S203 information extraction obtain as a result, the characteristics of for environmental protection event text, Combining environmental protects the environmentally friendly knowledge base of event, carries out extensive knowledge, feature extraction and characteristic value to text and calculates;Step S20402 marks training set two disaggregated models of training, root using having according to the Deep Semantics character representation that step S20401 is obtained Real-time grading, final output recognition result are carried out according to disaggregated model.
A kind of the analysis of public opinion system based on blowdown data, the analysis of public opinion system based on blowdown data includes: to obtain Modulus block corresponds to environment protection field public feelings information, the environmentally friendly knowledge of building Environment Oriented protection for obtaining enterprise based on internet Library, including green-body library, environmentally friendly factbase, environmentally friendly event base and environmentally friendly rule base;Categorization module, in conjunction with the environmental protection Knowledge base carries out partial structurtes extraction and classification to text, to identify the environmental protection with potential risk from mass text Event related text;Warning module, the tool for being identified based on the environmentally friendly knowledge base for obtaining module building to categorization module There is the environmental protection event related text of potential risk to carry out the cluster of structural data, and the text included according to each cluster Whether this number is more than given threshold value to decide whether to carry out real-time early warning.
In one embodiment, the acquisition module further include: building body unit, for constructing the green-body Library houses the stratification organizational form of environmentally friendly concept in the green-body library, and have between concept equivalence relation with And possible relation constraint;True unit is constructed, for constructing the environmentally friendly factbase, the true environmental protection of environmental protection is true It is housed in library by structured set obtained from semanteme disambiguation and entity unique identification;Event elements are constructed, are used for The environmentally friendly event base, including relative words are constructed, these vocabulary are by object, behavior, agent, word denoting the receiver of an action, time, place and reason Composition;Rules unit is constructed, for constructing the environmentally friendly rule base, houses the general of equivalence relation between concept and its establishment Rate;
The categorization module further include: participle unit, for being pre-processed sentence by sentence to text to be analyzed, by Chinese into Row participle and part-of-speech tagging, and special sequence of terms is merged and corrected;Unit is disambiguated, for based at participle unit Entity is carried out concept mapping based on the stratification concept space in the green-body library by the sequence of terms obtained after reason, and Concept disambiguation is carried out to ambiguity word simultaneously;Conversion unit, for based on the word after the disambiguation obtained after disambiguation cell processing Sequence of terms after disambiguation is carried out information extraction according to the basic clause of Chinese, converts structuring table for text sentence by sequence Up to form;Taxon, the structuring expression-form for being obtained after being handled based on conversion unit, in conjunction with the environmentally friendly knowledge base The Deep Semantics for obtaining current sentence indicate, and for classifying, as classification results are unrelated with environmental protection event and do not scan to The last sentence of the text then returns to participle unit processing, otherwise analyzes next text;
The warning module further include: resolution unit is utilized for being loaded into the environmental protection event text set identified Information extraction technique carries out structuring parsing to it, does not consider when and where information at this time, obtains describing every text topic Structured set;Recognition unit, for identifying and extracting every provision in conjunction with the when and where word in the environmentally friendly event base This when and where information, and obtain describing the time arrow and place vector of every text;Unit is filtered out, for that will tie Structure set is projected to the environmentally friendly knowledge base, and the structured features that filtering environmental protects event unrelated obtain every text Candidate structure feature set;Selection unit, for therein by calculating information selection of the structured features in different texts Validity feature subset;Obtaining unit, for constructing all structured features of observation text, by calculate structured features it Between similarity, while obtain description every text topic feature vector;Topic cluster cell, for based at obtaining unit The feature vector obtained in reason carries out topic cluster and obtains topic category set;Construction unit, in conjunction with the environmentally friendly thing Part library constructs all when and where features of observation text, carries out when and where reasoning respectively, is every text construction Temporal characteristics vector sum Site characterization vector;Time and location cluster cell, the feature for being obtained in being handled based on construction unit Vector carries out time and location cluster and obtains time and location category set;Integrated unit was used for topic category set and time Place category set is merged, and obtains the category set of final environmental protection event;Prewarning unit, for according to each poly- The text number sequence early warning degree that class is included, and will be more than the environmental protection event progress real-time early warning of given threshold value.
The taxon in one of the embodiments, further include: subelement is extracted, for handling according to conversion unit When information extraction obtain as a result, the characteristics of for environmental protection event text, combining environmental protects the environmental protection of event Knowledge base carries out extensive knowledge, feature extraction and characteristic value to text and calculates;Classification subelement, for according to extraction subelement The Deep Semantics character representation of acquisition is divided using there is mark training set two disaggregated models of training according to disaggregated model in real time Class, final output recognition result.
A kind of computer equipment, including memory and processor are stored with computer-readable instruction in the memory, institute When stating computer-readable instruction and being executed by the processor, so that the step of processor executes above-mentioned analysis method.
A kind of storage medium being stored with computer-readable instruction, the computer-readable instruction are handled by one or more When device executes, so that the step of one or more processors execute above-mentioned analysis method.
Above-mentioned the analysis of public opinion method, system, computer equipment and storage medium, it is corresponding by obtaining enterprise based on internet Environment protection field public feelings information constructs green-body library, the stratification tissue shape of environmentally friendly concept is housed in the green-body library Formula, and have equivalence relation and possible relation constraint between concept, constructs environmentally friendly factbase, described in the environmental protection fact It is housed in environmentally friendly factbase by structured set obtained from semanteme disambiguation and entity unique identification, constructs environmentally friendly thing Part library, including relative words, these vocabulary are made of object, behavior, agent, word denoting the receiver of an action, time, place and reason, building environmental protection Rule base houses the equivalence relation between concept and its probability of establishment, in conjunction with environmentally friendly knowledge base, to text to be analyzed by Sentence is pre-processed, and Chinese is carried out participle and part-of-speech tagging, and special sequence of terms is merged and corrected, based on Entity is carried out concept mapping based on the stratification concept space in the green-body library by the sequence of terms arrived, and right simultaneously Ambiguity word carries out concept disambiguation, based on the sequence of terms after obtained disambiguation, according to the basic clause of Chinese by the word after disambiguation Word order column carry out information extraction, convert structuring expression-form for text sentence, based on obtained structuring expression-form, knot Closing the Deep Semantics that the environmentally friendly knowledge base obtains current sentence indicates, and for classifying, such as classification results and environmental protection thing Part is unrelated and does not scan to the last sentence of the text and then returns to previous step, next text is otherwise analyzed, based on described in building Environmentally friendly knowledge base carries out the cluster of structural data, and the text number included according to each cluster to the text identified Whether it is more than given threshold value to decide whether to carry out real-time early warning, improves efficiency, reduce human cost, shorten week time Phase.
Detailed description of the invention
By reading the following detailed description of the preferred embodiment, various other advantages and benefits are common for this field Technical staff will become clear.The drawings are only for the purpose of illustrating a preferred embodiment, and is not considered as to the present invention Limitation.
Fig. 1 is the flow chart of the analysis of public opinion method in one embodiment;
Fig. 2 is the flow chart for the environmentally friendly knowledge base that one embodiment constructs Environment Oriented protection;
Fig. 3 is the flow chart that one embodiment carries out partial structurtes extraction and classification to text;
Fig. 4 is the flow chart for carrying out the cluster and Realtime Alerts of structural data in one embodiment to text;
Fig. 5 is the flow chart for obtaining the Deep Semantics of current sentence in one embodiment and indicating to classify;
Fig. 6 is the structural block diagram of the analysis of public opinion system based on blowdown data in one embodiment;
Fig. 7 is the structural block diagram that module is obtained in one embodiment;
Fig. 8 is the structural block diagram of categorization module in one embodiment;
Fig. 9 is the structural block diagram of warning module in one embodiment;
Figure 10 is the structural block diagram of taxon in one embodiment.
Specific embodiment
In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to the accompanying drawings and embodiments, right The present invention is described in detail.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, it is not used to Limit the present invention.
As a preferable embodiment, as shown in Figure 1, a kind of the analysis of public opinion method, which includes: step Rapid S1 obtains enterprise based on internet and corresponds to environment protection field public feelings information, constructs the environmentally friendly knowledge base of Environment Oriented protection, including Green-body library, environmentally friendly factbase, environmentally friendly event base and environmentally friendly rule base;
Environmental protection of enterprise public sentiment event is in the majority with manual research, which low efficiency, at high cost, and the train of thought of the event and The analysis of various correlations cannot be excavated rapidly, and the technical program is by collecting public feelings information regarding to the issue above, to this After a little public feelings informations carry out text analyzing, affiliated enterprise's blowdown monitoring point, to the progress of public sentiment problem whether occur to the enterprise Verifying.The method not only rapidly can provide solution for the environmentally friendly public sentiment event of certain enterprises, and can also provide reference Data are conducive to analysis event essence.Enterprise, which is obtained, based on internet corresponds to environment protection field public feelings information, building Environment Oriented protection Environmentally friendly knowledge base, including green-body library, environmentally friendly factbase, environmentally friendly event base and environmentally friendly rule base.Knowledge base is special needle The knowledge base that the environmental protections such as pollutant discharge of enterprise event early warning problem is constructed comprising green-body library, environmentally friendly factbase, ring Event base and environmentally friendly rule base are protected, the equivalence relation in green-body library between concept depends on Baidupedia, Wiki hundred Section, interaction encyclopaedia and the various synonym tables published, the building of environmentally friendly factbase is depended on to be obtained from internet Blowdown contamination accident correlation corpus, and take full advantage of various information extraction techniques, including Chinese word segmentation, part-of-speech tagging, The operation such as dependency analysis and the identification of special clause, environmentally friendly event base includes environment protection field relative words, these vocabulary are by right As, behavior, agent, word denoting the receiver of an action, time, place and reason composition, environmentally friendly rule base storage be equivalence relation between concept and Its probability set up, the acquisition of these background knowledges are obtained from corpus automatically in such a way that machine learning adds pattern match , the knowledge maintenance and update that can be automated.
Step S2 carries out partial structurtes extraction and classification to text, to know from mass text in conjunction with environmentally friendly knowledge base The environmental protection event related text of potential risk is not provided;
Text to be analyzed is pre-processed sentence by sentence, Chinese is subjected to participle and part-of-speech tagging, and to special word Sequence is merged and is corrected, based on obtained sequence of terms, by entity based on the stratification concept space in green-body library Concept mapping is carried out, and concept disambiguation is carried out to ambiguity word simultaneously, based on the sequence of terms after obtained disambiguation, according to Chinese Sequence of terms after disambiguation is carried out information extraction by basic clause, converts structuring expression-form for text sentence, based on The structuring expression-form arrived is indicated in conjunction with the Deep Semantics that environmentally friendly knowledge base obtains current sentence, and for classifying, is such as classified As a result unrelated with environmental protection event and do not scan to the last sentence of the text, then it returns and text to be analyzed is located in advance sentence by sentence Chinese is carried out participle and part-of-speech tagging, and special sequence of terms is merged and corrected by reason, otherwise analyzes next provision This.
Step S3, the environmentally friendly knowledge base based on step S1 building carry out structural data to the text that step S2 is identified Cluster, and decide whether to carry out real-time early warning according to whether the included text number of each cluster is more than given threshold value.
It is loaded into the environmental protection event text set identified, structuring parsing is carried out to it using information extraction technique, Do not consider when and where information at this time, the structured set for describing every text topic is obtained, in conjunction in environmentally friendly event base When and where word, identifies and extracts the when and where information of every text, and obtain describing time of every text to Amount and place vector, structured set is projected to environmentally friendly knowledge base, and the structured features that filtering environmental protects event unrelated obtain To the candidate structure feature set of every text, by calculating, information selection of the structured features in different texts is therein to be had Character subset is imitated, all structured features of observation text are constructed, by calculating the similarity between structured features, simultaneously The feature vector of every text topic of description is obtained, the feature vector based on acquisition carries out topic cluster and obtains topic classification Set constructs all when and where features of observation text in conjunction with environmentally friendly event base, carries out when and where respectively and pushes away Reason, is every text build time feature vector and Site characterization vector, and it is poly- to carry out time and location for the feature vector based on acquisition Class simultaneously obtains time and location category set, and topic category set is merged with time and location category set, and obtains final The category set of environmental protection event according to the included text number sequence early warning degree of each cluster, and will be more than given threshold The environmental protection event of value carries out real-time early warning.
As shown in Fig. 2, in one embodiment, the step Sl includes:
Step Sl0l constructs green-body library, the stratification tissue shape of environmentally friendly concept is housed in the green-body library Formula, and there is equivalence relation and possible relation constraint between concept;
Green-body library is constructed, the stratification organizational form of environmentally friendly concept is housed in the green-body library, and general There is equivalence relation and possible relation constraint, in conjunction with known hyponymy, open classification, polysemant and same between thought Adopted word information carries out the horizontal and vertical fusion of concept hierarchy, on the other hand combines the example with attribute information, utilizes conclusion Decision-tree model carries out the automatic identification of entity stratification concept, then forms the stratification institutional framework and reality of field concept The mapping relations of example-concept, the equivalence relation in green-body library between concept depend on Baidupedia, wikipedia, Interaction encyclopaedia and the various synonym tables published.
Step S102 constructs environmentally friendly factbase, houses in the true environmentally friendly factbase of environmental protection and disappear by semanteme Structured set obtained from discrimination and entity unique identification;
The building of environmentally friendly factbase depends on the blowdown contamination accident correlation corpus obtained from internet, and fills Divide and various information extraction techniques are utilized, including the behaviour such as Chinese word segmentation, part-of-speech tagging, dependency analysis and the identification of special clause Make.The blowdown contamination accident correlation corpus obtained from internet, and various information extraction techniques are taken full advantage of, including in The operations such as literary participle, part-of-speech tagging, dependency analysis and the identification of special clause will after obtaining a large amount of structuring group Stratification concept of the structuring group therein into ontology library is mapped, if the group has more than one concept, Semantic disambiguation is carried out according to the relationship and other information organized where it, to obtain a host of facts that there is uniqueness concept to identify Structured set.
Step S103, constructs environmentally friendly event base, including relative words, these vocabulary by object, behavior, agent, word denoting the receiver of an action, when Between, place and reason composition;
Environmentally friendly event base includes environment protection field relative words, these vocabulary by object, behavior, agent, word denoting the receiver of an action, the time, Point and reason composition, house all kinds of time words and its numeric coding in library, the purpose of coding be by recognition time word and The exact time is identified on the basis of the issuing time of text.In addition, the library can also include time-piece, the world is housed in table The time zone of upper every country.For example, the Tokyo time is different from Beijing time, they differ 1 time zone, and the morning and meaning in afternoon are not It is a time, and midnight and morning are then likely to be a time, these knowledge need knowledge base to provide.To these times into The calculating of row similarity degree needs knowledge base to tell that computer morning, midnight are how many which period and one day etc. hour Common sense.The effect of place ending word is to aid in the unrecognized place word of identification segmentation methods and determines the upper lower layer in place Grade.Place is described generally according to sequence from big to small, and such phenomenon is level constraint, such as Shanghai Fengxian District.Therefore, by In the place word that participle mistake can not be identified correctly, will constrain property according to level in the present invention is identified, the process It will be related to the merger of multiple words.Further for example, for a certain multi-layer place, " Changle of Fujian Province city Hunan-Town Hydro-electric ", word segmentation result For " Fujian Province/Changle city the ns/lake the ns/south the a/town n/n ", segmentation methods can not correctly identify " Hunan-Town Hydro-electric ", at this time according to level Constraint, can identify that the place is ended up with town, and the statement sequence in town should be after city, it may thus be appreciated that " lake/south the a/town n/η " It should be one place, therefore, word segmentation result be updated to " Fujian Province/Changle city the ns/Hunan-Town Hydro-electric ns/ns ".
Step S104 constructs environmentally friendly rule base, houses the equivalence relation between concept and its probability of establishment.
What environmentally friendly rule base was stored is the probability of the equivalence relation and its establishment between concept, the acquisition of these background knowledges It is obtained automatically from corpus by the way of machine learning plus pattern match, the knowledge maintenance and update that can be automated.Base In above-mentioned environmentally friendly factbase and green-body library, probability graph model technology and first order logic, such as Markov Logic Network are utilized It realizes the automatic study for not knowing rule, and obtains the logical expressions for adding rule shaped like weight, then filter out satisfaction and actually answer With the high quality logical expressions of demand, for example, 0.80 sewage<s: blowdown><=>pollution<s: blowdown >, 0.90 toxic<s: row Dirty > & pollution < 〇: environment><=>blowdown<s: blowdown >, wherein s indicates that concept " blowdown " serves as subject in the group, and 〇 indicates real Body serves as object in tuple, and & indicates logical AND, <=> expression equivalence relation.
As shown in figure 3, in one embodiment, the step S2 includes:
Step S201 pre-processes text to be analyzed sentence by sentence, Chinese is carried out participle and part-of-speech tagging, and to spy Different sequence of terms is merged and is corrected;
Text to be analyzed is pre-processed sentence by sentence, Chinese is subjected to participle and part-of-speech tagging, and to special word Sequence is merged and is corrected, such as to example sentence text, the result of participle and part-of-speech tagging such as < today/t, the morning/t, and 10 points Half/t ,/w, sewage/n, in/p, Hunan-Town Hydro-electric/ns, nearby/f, diffusion/V ,/w, pollution/v, region/n ,/wn, exposure/n >.
Step S202, it is based on the sequence of terms that step S201 is obtained, entity is general based on the stratification in green-body library It reads space and carries out concept mapping, and concept disambiguation is carried out to ambiguity word simultaneously;
Training data is prepared first, then learns more disaggregated models using naive Bayesian principle, wherein class label pair Stratification concept is answered, feature vector is made of neighbouring unambiguously word and its said concepts given under window, utilizes ambiguity reality Special context locating for body carries out concept identification using the disaggregated model that training generates automatically.For example, pollution: blowdown, poverty alleviation, dress It repairs, it is pollution: blowdown that concept, which disambiguates result,.
Step S203, the sequence of terms after the disambiguation obtained based on step S202, after being disambiguated according to the basic clause of Chinese Sequence of terms carry out information extraction, convert structuring expression-form for text sentence;
Sequence of terms after disambiguation is carried out information pumping according to the basic clause of Chinese by the sequence of terms after obtained disambiguation It takes, converts structuring expression-form, such as group: blowdown (s: Changning, p: Hunan-Town Hydro-electric, t: this morning ten for text sentence Point), it pollutes (s: environment, 〇: hidden pipe+underground), wherein p indicates that location component, t indicate that temporal information, "+" indicate arranged side by side Relationship, i.e. " hidden pipe " and " underground " each act as the object of predicate pollution.
Step S204 obtains current sentence in conjunction with environmentally friendly knowledge base based on the structuring expression-form that step S203 is obtained Deep Semantics indicate, and for classifying, if classification results are unrelated with environmental protection event and do not scan to the last sentence of the text, Otherwise then return step S201 analyzes next text.
Based on obtained structuring expression-form, indicated in conjunction with the Deep Semantics that environmentally friendly knowledge base obtains current sentence, and For classifying, if classification results are unrelated with environmental protection event and do not scan to the last sentence of the text, then return step S201, no Then analyze next text, according to information extraction obtain as a result, the characteristics of for environmental protection event text, combining environmental is protected The environmentally friendly knowledge base of shield event carries out extensive knowledge, feature extraction and characteristic value to text and calculates, according to the Deep Semantics of acquisition Character representation carries out real-time grading, final output identification according to disaggregated model using there is mark training set two disaggregated models of training As a result.
As shown in figure 4, in one embodiment, the step S3 includes:
Step S301 is loaded into the environmental protection event text set identified, is tied using information extraction technique to it Structure neutralizing analysis, does not consider when and where information at this time, obtains the structured set for describing every text topic;
It is loaded into the environmental protection event text set identified, structuring parsing is carried out to it using information extraction technique, Do not consider when and where information at this time, obtains the structured set for describing every text topic, structuring parsing includes to text This carries out participle and structuring extraction operation, and the feature finally parsed will be stored in unified data structure.
Step S302 identifies and extracts the time and ground of every text in conjunction with the when and where word in environmentally friendly event base Point information, and obtain describing the time arrow and place vector of every text;
The publication of text information and decimation in time, the time that the text occurs, mark and time library based on word algorithm, Time word extraction is carried out to each text, wherein identifying by the way of pattern match to complicated time word, is based on time library, Numerical value decoding operate is carried out to each time word identified, is time section associated by determining time word, time grain The information such as bottom on degree, time are the word in place for word algorithm tag, inquire place library, identify the upper and lower of the place Position and place level;When encountering the unrecognized new place word of word algorithm, identify ground by matching place mark words Point word boundary, if place hyponymy it is known that if can confirm current location according to the descending statement sequence in place The correctness of word identification.The level of place word is sorted out, if multiple places are extracted from a text, according to intersite Hyponymy correctly sorts out them, and the same place is regarded in multiple places with hyponymy as.
Step S303 projects structured set to environmentally friendly knowledge base, and the structuring that filtering environmental protects event unrelated is special Sign, obtains the candidate structure feature set of every text;
Structured set is projected, the structured features that filtering environmental protects event unrelated to environmentally friendly knowledge base, is obtained every The candidate structure feature set of bar text, for every text, according to field event base, knot that filtering environmental protects event unrelated Structure feature.
Step S304 chooses validity feature subset therein by calculating information of the structured features in different texts;
Validity feature subset therein is chosen by calculating information of the structured features in different texts, is reached in not shadow In the case where ringing early warning effect, the dimension of feature is greatly reduced, reduces computation complexity.
Step S305 constructs all structured features of observation text, similar between structured features by calculating Degree, while obtaining the feature vector of every text topic of description;
All structured features for constructing observation text are obtained simultaneously by calculating the similarity between structured features The feature vector of every text topic must be described, initialisation structures characteristic set is sky, inputs the candidate structure of current text Change feature, when feature vector is empty, a structured features be put into wherein, and feature vector is set as 0 in corresponding position, Otherwise by its structured features and element in characteristic set one by one compared with, retain the most similar feature and similarity.
Step S306 is carried out topic cluster and is obtained topic category set based on the feature vector obtained in step S305;
The feature vector of acquisition, carry out topic cluster simultaneously obtain topic category set, text is polymerized to two classes, need when Between and place matching in distinguish, in addition, cannot achieve matching in structured features in two comparison procedure of class one and class, because The similitude of this two class is lower, gathers without being clustered process for a classification, when and where reasoning below can be effective Solve this problem.
Step S307 constructs all when and where features of observation text, when carrying out respectively in conjunction with environmentally friendly event base Between and place reasoning, be every text build time feature vector and Site characterization vector;
In conjunction with environmentally friendly event base, all when and where features of observation text are constructed, carry out when and where respectively Reasoning is every text build time feature vector and Site characterization vector.Current time and Site characterization set are initialized, is enabled It is respectively sky, for every text, distinguishes build time feature and Site characterization according to its time and location information, feature Number depends on the quantity of different time and place, carries out time similarity reasoning and compares two that is, under regular hour window Whether a time identical, includes in section, intersects or the process without intersection, when two time phase differences be no more than certain threshold value or Two times, there is intersection then to think successful match, and feature vector is set as 1 in corresponding position, otherwise this feature be added current In temporal characteristics set, feature vector is set as 1 in the position, remaining position be 0, the time a little, the stage, also have fuzzy expression As in the recent period, common people are also difficult to very much accurately on indicating the time, and therefore, the time herein compares includes using in section Mode, i.e. two time phase differences are no more than certain threshold value or two times, and there is intersection then to think successful match.Carry out place Similarity mode inquires green-body library and place library, determines whether two places are identical, of equal value, to have father and son include to close System, or after whether adding or abandoning place mark words in the end there is above-mentioned relation then to think successful match, by feature vector It is set as 1 in corresponding position, otherwise this feature is added in current location feature vector, feature vector is set as 1 in the position, remaining Position is 0.
Step S308 is carried out time and location cluster and is obtained time and location based on the feature vector obtained in step S307 Category set;
Feature vector based on acquisition carries out time and location cluster and simultaneously obtains time and location category set, according to the time and Site characterization is finally polymerized to time and location cluster.
Topic category set is merged with time and location category set, and obtains final environmental protection by step S309 The category set of event;
Topic category set is merged with time and location category set, and obtains the classification of final environmental protection event Set.Each classification is split, so that the text after splitting in each cluster also belongs to the same cluster.Based on as a result, to upper Category set obtained in one step merges, so that the text of each cluster also belongs to the same cluster after merging, and wherein text Similarity based on phrase feature is greater than given threshold value.
Step S310 according to the included text number sequence early warning degree of each cluster, and will be more than the ring of given threshold value Event is protected to carry out real-time early warning in border.
It according to the included text number sequence early warning degree of each cluster, and will be more than the environmental protection event of given threshold value Carry out timely early warning.
As shown in figure 5, in one embodiment, the step S204 includes:
Step S20401, according to step S203 information extraction obtain as a result, the characteristics of for environmental protection event text, Combining environmental protects the environmentally friendly knowledge base of event, carries out extensive knowledge, feature extraction and characteristic value to text and calculates;
The equivalent entities set that current entity to be analyzed is obtained using green-body library, by the member in equivalent entities set Element replaces entity to be analyzed one by one, participates in subsequent calculating.Such as " hidden pipe " in example sentence, it is got using green-body library Equivalence set is seepage pit, and seepage well enters canal, can enter canal and replace hidden pipe respectively, participate in subsequent calculating seepage pit, seepage well.Relationship is extensive The equivalence relation set for obtaining the relationship using environmentally friendly rule base for the relationship that is analysed to, by the element in equivalence relation set One by one instead of entity to be analyzed, the relationship blowdown in subsequent arithmetic, such as example sentence is participated in, is got using environmentally friendly rule base Equivalence relation collection is combined into discharge, shunts, and interflow can will discharge, and shunt, and interflow replaces blowdown respectively, participates in subsequent arithmetic.For Following several category features are mainly extracted in the characteristics of environmental protection event, the invention, predicate, the predicate in group that information extraction obtains at Point.In environmental protection event text, group predicate verb generally has very strong representativeness, such as " blowdown ", " undercurrent ", and here " blowdown " and " undercurrent " has stronger environmental contamination.
Step S20402 marks training set training using having according to the Deep Semantics character representation that step S20401 is obtained Two disaggregated models carry out real-time grading, final output recognition result according to disaggregated model.
According to the Deep Semantics character representation of acquisition, using there is mark training set two disaggregated models of training, according to classification mould Type carries out real-time grading, final output recognition result, and tag along sort is referred to whether there is or not environmental protection event is related to, in real-time grading In the process, by calculate disaggregated model value whether be more than given threshold value judge target text and Mass disturbance whether phase It closes, two disaggregated models here can be any Supervised classification model in machine learning techniques, any real based on above-mentioned mechanism Existing environmental protection event recognition method, should be included in the range of the invention.
As shown in fig. 6, in one embodiment, providing a kind of the analysis of public opinion system based on blowdown data, the base Include: acquisition module in the analysis of public opinion system of blowdown data, corresponds to environment protection field public sentiment for obtaining enterprise based on internet Information, the environmentally friendly knowledge base of building Environment Oriented protection, including green-body library, environmentally friendly factbase, environmentally friendly event base and environmental protection Rule base;Categorization module carries out partial structurtes extraction and classification to text for combining environmentally friendly knowledge base, with literary from magnanimity The environmental protection event related text with potential risk is identified in this;Warning module, for based on environmentally friendly knowledge base to point The cluster for the environmental protection event related text progress structural data with potential risk that generic module identifies, and according to Whether the included text number of each cluster is more than given threshold value to decide whether to carry out real-time early warning.
As shown in fig. 7, in one embodiment, the acquisition module further include: building body unit, it is described for constructing Green-body library houses the stratification organizational form of environmentally friendly concept in the green-body library, and have between concept etc. Valence relationship and possible relation constraint;True unit is constructed, for constructing the environmentally friendly factbase, described in the environmental protection fact It is housed in environmentally friendly factbase by structured set obtained from semanteme disambiguation and entity unique identification;Building event list Member, for constructing the environmentally friendly event base, including relative words, these vocabulary by object, behavior, agent, word denoting the receiver of an action, the time, Point and reason composition;Construct rules unit, for constructing the environmentally friendly rule base, house equivalence relation between concept and its The probability of establishment;
As shown in figure 8, the categorization module further include: participle unit, for being located in advance sentence by sentence to text to be analyzed Chinese is carried out participle and part-of-speech tagging, and special sequence of terms is merged and corrected by reason;Unit is disambiguated, base is used for The sequence of terms obtained after participle unit processing carries out entity based on the stratification concept space in the green-body library Concept mapping, and concept disambiguation is carried out to ambiguity word simultaneously;Conversion unit, for being disappeared based on what is obtained after disambiguation cell processing Sequence of terms after disambiguation is carried out information extraction according to the basic clause of Chinese, text sentence is converted by the sequence of terms after discrimination For structuring expression-form;Taxon, the structuring expression-form for being obtained after being handled based on conversion unit, in conjunction with environmental protection The Deep Semantics that knowledge base obtains current sentence indicate, and for classifying, as classification results are unrelated with environmental protection event and not It scans to the last sentence of the text, then returns to participle unit and continue with, otherwise analyze next text;
As shown in figure 9, the warning module further include: resolution unit, for being loaded into the environmental protection event text identified This set carries out structuring parsing to it using information extraction technique, does not consider when and where information at this time, and it is every to obtain description The structured set of text topic;Recognition unit is identified and is extracted for combining the when and where word in environmentally friendly event base The when and where information of every text, and obtain describing the time arrow and place vector of every text;Unit is filtered out, is used In projecting structured set to environmentally friendly knowledge base, the structured features that filtering environmental protects event unrelated obtain every text Candidate structure feature set;Selection unit, for being chosen wherein by calculating information of the structured features in different texts Validity feature subset;Obtaining unit, for constructing all structured features of observation text, by calculating structured features Between similarity, while obtain description every text topic feature vector;Topic cluster cell, for being based on obtaining unit The feature vector obtained in processing carries out topic cluster and obtains topic category set;Construction unit, for combining environmentally friendly event Library constructs all when and where features of observation text, carries out when and where reasoning respectively, when being every text construction Between feature vector and Site characterization vector;Time and location cluster cell, feature for being obtained in being handled based on construction unit to Amount carries out time and location cluster and obtains time and location category set;Integrated unit, for by topic category set and temporally Point category set is merged, and obtains the category set of final environmental protection event;Prewarning unit, for according to each cluster The text number sequence early warning degree for being included, and the environmental protection event progress real-time early warning that will be more than given threshold value.
As shown in Figure 10, in one embodiment, the taxon further include: subelement is extracted, for according to conversion It is that information extraction when cell processing obtains as a result, the characteristics of being directed to environmental protection event text, combining environmental protect event Environmentally friendly knowledge base carries out extensive knowledge, feature extraction and characteristic value to text and calculates;Classification subelement, for sub according to extracting The Deep Semantics character representation that unit obtains is carried out real using there is mark training set two disaggregated models of training according to disaggregated model When classify, final output recognition result.
In one embodiment it is proposed that a kind of computer equipment, the computer equipment includes memory and processor, Computer-readable instruction is stored in memory, when computer-readable instruction is executed by processor, so that described in processor execution Realize when computer program: step S1 obtains enterprise based on internet and corresponds to environment protection field public feelings information, and building Environment Oriented is protected The environmentally friendly knowledge base of shield, including green-body library, environmentally friendly factbase, environmentally friendly event base and environmentally friendly rule base;Step S2, coupling collar Knowledge base is protected, partial structurtes extraction and classification are carried out to text, to identify the ring with potential risk from mass text Protect event related text in border;Step S3, the text that the environmentally friendly knowledge base based on step S1 building identifies step S2 into The cluster of row structural data, and according to the included text number of each cluster whether be more than given threshold value decide whether into Row real-time early warning.
In one embodiment, the step Sl includes:
Step Sl0l constructs the green-body library, the stratification group of environmentally friendly concept is housed in the green-body library Form is knitted, and there is equivalence relation and possible relation constraint between concept;
Step S102 constructs the environmentally friendly factbase, houses in the true environmentally friendly factbase of environmental protection by language Justice disambiguate and entity unique identification obtained from structured set;
Step S103, constructs the environmentally friendly event base, including relative words, these vocabulary by object, behavior, agent, by Thing, time, place and reason composition;
Step S104 constructs the environmentally friendly rule base, houses the equivalence relation between concept and its probability of establishment.
The step S2 includes: in one of the embodiments,
Step S201 pre-processes text to be analyzed sentence by sentence, Chinese is carried out participle and part-of-speech tagging, and to spy Different sequence of terms is merged and is corrected;
Step S202, it is based on the sequence of terms that step S201 is obtained, entity is general based on the stratification in green-body library It reads space and carries out concept mapping, and concept disambiguation is carried out to ambiguity word simultaneously;
Step S203, the sequence of terms after the disambiguation obtained based on step S202, after being disambiguated according to the basic clause of Chinese Sequence of terms carry out information extraction, convert structuring expression-form for text sentence;
Step S204 obtains current sentence in conjunction with environmentally friendly knowledge base based on the structuring expression-form that step S203 is obtained Deep Semantics indicate, and for classifying, if classification results are unrelated with environmental protection event and do not scan to the last sentence of the text, Otherwise then return step S201 analyzes next text.
In one embodiment, the step S3 includes:
Step S301 is loaded into the environmental protection event text set identified, is tied using information extraction technique to it Structure neutralizing analysis, does not consider when and where information at this time, obtains the structured set for describing every text topic;
Step S302 identifies and extracts the time and ground of every text in conjunction with the when and where word in environmentally friendly event base Point information, and obtain describing the time arrow and place vector of every text;
Step S303 projects structured set to environmentally friendly knowledge base, and the structuring that filtering environmental protects event unrelated is special Sign, obtains the candidate structure feature set of every text;
Step S304 chooses validity feature subset therein by calculating information of the structured features in different texts;
Step S305 constructs all structured features of observation text, similar between structured features by calculating Degree, while obtaining the feature vector of every text topic of description;
Step S306 is carried out topic cluster and is obtained topic category set based on the feature vector obtained in step S305;
Step S307 constructs all when and where features of observation text, when carrying out respectively in conjunction with environmentally friendly event base Between and place reasoning, be every text build time feature vector and Site characterization vector;
Step S308 is carried out time and location cluster and is obtained time and location based on the feature vector obtained in step S307 Category set;
Topic category set is merged with time and location category set, and obtains final environmental protection by step S309 The category set of event;
Step S310 according to the included text number sequence early warning degree of each cluster, and will be more than the ring of given threshold value Event is protected to carry out real-time early warning in border.
In one embodiment, the step S204 includes:
Step S20401, according to step S203 information extraction obtain as a result, the characteristics of for environmental protection event text, Combining environmental protects the environmentally friendly knowledge base of event, carries out extensive knowledge, feature extraction and characteristic value to text and calculates;
Step S20402 marks training set training using having according to the Deep Semantics character representation that step S20401 is obtained Two disaggregated models carry out real-time grading, final output recognition result according to disaggregated model.
In one embodiment it is proposed that a kind of storage medium for being stored with computer-readable instruction, this is computer-readable When instruction is executed by one or more processors, so that one or more processors execute: step S1 obtains enterprise based on internet Industry corresponds to environment protection field public feelings information, the environmentally friendly knowledge base of building Environment Oriented protection, including green-body library, the environmental protection fact Library, environmentally friendly event base and environmentally friendly rule base;Step S2 carries out partial structurtes extraction to text and divides in conjunction with environmentally friendly knowledge base Class, to identify the environmental protection event related text with potential risk from mass text;Step S3 is based on step S1 structure The environmentally friendly knowledge base built carries out the cluster of structural data to the text that step S2 is identified, and is included according to each cluster Text number whether be more than given threshold value to decide whether to carry out real-time early warning.
In one embodiment, the step Sl includes:
Step Sl0l constructs the green-body library, the stratification group of environmentally friendly concept is housed in the green-body library Form is knitted, and there is equivalence relation and possible relation constraint between concept;
Step S102 constructs the environmentally friendly factbase, houses and disambiguates by semanteme and real in the environmental protection factbase Structured set obtained from body unique identification;
Step S103, constructs the environmentally friendly event base, including relative words, these vocabulary by object, behavior, agent, by Thing, time, place and reason composition;
Step S104 constructs the environmentally friendly rule base, houses the equivalence relation between concept and its probability of establishment.
The step S2 includes: in one of the embodiments,
Step S201 pre-processes text to be analyzed sentence by sentence, Chinese is carried out participle and part-of-speech tagging, and to spy Different sequence of terms is merged and is corrected;
Step S202, based on the sequence of terms that step S201 is obtained, by entity based on the level in the green-body library Change concept space and carry out concept mapping, and concept disambiguation is carried out to ambiguity word simultaneously;
Step S203, the sequence of terms after the disambiguation obtained based on step S202, after being disambiguated according to the basic clause of Chinese Sequence of terms carry out information extraction, convert structuring expression-form for text sentence;
Step S204 obtains current sentence in conjunction with environmentally friendly knowledge base based on the structuring expression-form that step S203 is obtained Deep Semantics indicate, and for classifying, if classification results are unrelated with environmental protection event and do not scan to the last sentence of the text, Otherwise then return step S201 analyzes next text.
In one embodiment, the step S3 includes:
Step S301 is loaded into the environmental protection event text set identified, is tied using information extraction technique to it Structure neutralizing analysis, does not consider when and where information at this time, obtains the structured set for describing every text topic;
Step S302 identifies and extracts the time and ground of every text in conjunction with the when and where word in environmentally friendly event base Point information, and obtain describing the time arrow and place vector of every text;
Step S303 projects structured set to environmentally friendly knowledge base, and the structuring that filtering environmental protects event unrelated is special Sign, obtains the candidate structure feature set of every text;
Step S304 chooses validity feature subset therein by calculating information of the structured features in different texts;
Step S305 constructs all structured features of observation text, similar between structured features by calculating Degree, while obtaining the feature vector of every text topic of description;
Step S306 is carried out topic cluster and is obtained topic category set based on the feature vector obtained in step S305;
Step S307 constructs all when and where features of observation text, when carrying out respectively in conjunction with environmentally friendly event base Between and place reasoning, be every text build time feature vector and Site characterization vector;
Step S308 is carried out time and location cluster and is obtained time and location based on the feature vector obtained in step S307 Category set;
Topic category set is merged with time and location category set, and obtains final environmental protection by step S309 The category set of event;
Step S310 according to the included text number sequence early warning degree of each cluster, and will be more than the ring of given threshold value Event is protected to carry out real-time early warning in border.
In one embodiment, the step S204 includes:
Step S20401, according to step S203 information extraction obtain as a result, the characteristics of for environmental protection event text, Combining environmental protects the environmentally friendly knowledge base of event, carries out extensive knowledge, feature extraction and characteristic value to text and calculates;Step S20402 marks training set two disaggregated models of training, root using having according to the Deep Semantics character representation that step S20401 is obtained Real-time grading, final output recognition result are carried out according to disaggregated model.
Those of ordinary skill in the art will appreciate that all or part of the steps in the various methods of above-described embodiment is can It is completed with instructing relevant hardware by program, which can be stored in a computer readable storage medium, storage Medium may include: read-only memory (ROM, Read Only Memory), random access memory (RAM, Random Access Memory), disk or CD etc..
Each technical characteristic of embodiment described above can be combined arbitrarily, for simplicity of description, not to above-mentioned reality It applies all possible combination of each technical characteristic in example to be all described, as long as however, the combination of these technical characteristics is not deposited In contradiction, all should be considered as described in this specification.
Some exemplary embodiments of the invention above described embodiment only expresses, the description thereof is more specific and detailed, but It cannot be construed as a limitation to the scope of the present invention.It should be pointed out that for the ordinary skill people of this field For member, without departing from the inventive concept of the premise, various modifications and improvements can be made, these belong to of the invention Protection scope.Therefore, the scope of protection of the patent of the invention shall be subject to the appended claims.

Claims (10)

1. a kind of the analysis of public opinion method characterized by comprising
Step S1 obtains enterprise based on internet and corresponds to environment protection field public feelings information, the environmentally friendly knowledge of building Environment Oriented protection Library, including green-body library, environmentally friendly factbase, environmentally friendly event base and environmentally friendly rule base;
Step S2 carries out partial structurtes extraction and classification to text, to know from mass text in conjunction with the environmentally friendly knowledge base The environmental protection event related text of potential risk is not provided;
Step S3, based on the environmentally friendly knowledge base to the related text of environmental protection event with potential risk described in identifying The cluster of this progress structural data, and determine to be according to whether the included text number of each cluster is more than given threshold value No carry out real-time early warning.
2. the analysis of public opinion method according to claim 1, which is characterized in that the step Sl includes:
Step Sl0l constructs the green-body library, and the stratification tissue shape of environmentally friendly concept is housed in the green-body library Formula, and there is equivalence relation and possible relation constraint between concept;
Step S102 constructs the environmentally friendly factbase, housed in the environmental protection factbase disambiguated by semanteme and entity only Structured set obtained from one property mark;
Step S103, constructs the environmentally friendly event base, including relative words, these vocabulary by object, behavior, agent, word denoting the receiver of an action, when Between, place and reason composition;
Step S104 constructs the environmentally friendly rule base, houses the equivalence relation between concept and its probability of establishment.
3. the analysis of public opinion method according to claim 1, which is characterized in that the step S2 includes:
Step S201 pre-processes text to be analyzed sentence by sentence, Chinese is carried out participle and part-of-speech tagging, and to special Sequence of terms is merged and is corrected;
Step S202, it is based on the sequence of terms that step S201 is obtained, entity is general based on the stratification in the green-body library It reads space and carries out concept mapping, and concept disambiguation is carried out to ambiguity word simultaneously;
Step S203, the sequence of terms after the disambiguation obtained based on step S202, according to the basic clause of Chinese by the word after disambiguation Word order column carry out information extraction, convert structuring expression-form for text sentence;
Step S204 obtains current sentence in conjunction with the environmentally friendly knowledge base based on the structuring expression-form that step S203 is obtained Deep Semantics indicate, and for classifying, if classification results are unrelated with environmental protection event and do not scan to the last sentence of the text, Otherwise then return step S201 analyzes next text.
4. the analysis of public opinion method according to claim 1, which is characterized in that the step S3 includes:
Step S301, is loaded into the environmental protection event text set identified, carries out structuring to it using information extraction technique Parsing, does not consider when and where information at this time, obtains the structured set for describing every text topic;
Step S302 identifies and extracts the time and ground of every text in conjunction with the when and where word in the environmentally friendly event base Point information, and obtain describing the time arrow and place vector of every text;
Step S303 projects structured set to the environmentally friendly knowledge base, and the structuring that filtering environmental protects event unrelated is special Sign, obtains the candidate structure feature set of every text;
Step S304 chooses validity feature subset therein by calculating information of the structured features in different texts;
Step S305 constructs all structured features of observation text, by calculating the similarity between structured features, together When obtain description every text topic feature vector;
Step S306 is carried out topic cluster and is obtained topic category set based on the feature vector obtained in step S305;
Step S307 constructs all when and where features of observation text, when carrying out respectively in conjunction with the environmentally friendly event base Between and place reasoning, be every text build time feature vector and Site characterization vector;
Step S308 is carried out time and location cluster and is obtained time and location classification based on the feature vector obtained in step S307 Set;
Topic category set is merged with time and location category set, and obtains final environmental protection event by step S309 Category set;
Step S310 according to the included text number sequence early warning degree of each cluster, and will be more than the environment guarantor of given threshold value Shield event carries out real-time early warning.
5. the analysis of public opinion method according to claim 2, which is characterized in that the step S204 includes:
Step S20401, according to step S203 information extraction obtain as a result, the characteristics of for environmental protection event text, in conjunction with The environmentally friendly knowledge base of environmental protection event, carries out extensive knowledge, feature extraction and characteristic value to text and calculates;
Step S20402 marks two points of training set training using having according to the Deep Semantics character representation that step S20401 is obtained Class model carries out real-time grading, final output recognition result according to disaggregated model.
6. a kind of the analysis of public opinion system based on blowdown data, which is characterized in that the analysis of public opinion system based on blowdown data System includes: acquisition module, corresponds to environment protection field public feelings information for obtaining enterprise based on internet, building Environment Oriented protection Environmentally friendly knowledge base, including green-body library, environmentally friendly factbase, environmentally friendly event base and environmentally friendly rule base;Categorization module, for combining The environmental protection knowledge base, carries out partial structurtes extraction and classification to text, identified from mass text with potential hidden The environmental protection event related text of trouble;Warning module, for the environmentally friendly knowledge base based on acquisition module building to categorization module The environmental protection event related text with potential risk identified carries out the cluster of structural data, and according to each poly- Whether the text number that class is included is more than given threshold value to decide whether to carry out real-time early warning.
7. the analysis of public opinion system according to claim 6 based on blowdown data, which is characterized in that the acquisition module is also Include: building body unit, for constructing the green-body library, houses ring in green-body library described in the green-body The stratification organizational form of concept is protected, and there is equivalence relation and possible relation constraint between concept;Building is true single Member, for constructing the environmentally friendly factbase, housed in the true environmentally friendly factbase of environmental protection disambiguated by semanteme and Structured set obtained from entity unique identification;Event elements are constructed, for constructing the environmentally friendly event base, including correlation Vocabulary, these vocabulary are made of object, behavior, agent, word denoting the receiver of an action, time, place and reason;Rules unit is constructed, for constructing The environmental protection rule base, houses the equivalence relation between concept and its probability of establishment;
The categorization module further include: participle unit is divided Chinese for pre-processing sentence by sentence to text to be analyzed Word and part-of-speech tagging, and special sequence of terms is merged and corrected;Unit is disambiguated, after based on participle unit processing Entity is carried out concept mapping based on the stratification concept space in the green-body library by obtained sequence of terms, and simultaneously Concept disambiguation is carried out to ambiguity word;Conversion unit, for based on the sequence of terms disambiguated after cell processing after obtained disambiguation, The sequence of terms after disambiguation is subjected to information extraction according to Chinese basic clause, converts structuring expression shape for text sentence Formula;Taxon, the structuring expression-form for obtaining after being handled based on conversion unit are obtained in conjunction with the environmentally friendly knowledge base The Deep Semantics of current sentence indicate, and are used to classify, as classification results are unrelated with environmental protection event and do not scan to this article This last sentence then returns to participle unit processing, otherwise analyzes next text;
The warning module further include: resolution unit utilizes information for being loaded into the environmental protection event text set identified Extraction technique carries out structuring parsing to it, does not consider when and where information at this time, obtains the knot for describing every text topic Structure set;Recognition unit, for identifying and extracting every text in conjunction with the when and where word in the environmentally friendly event base When and where information, and obtain describing the time arrow and place vector of every text;Unit is filtered out, is used for structuring Gather and projected to the environmentally friendly knowledge base, the structured features that filtering environmental protects event unrelated obtain the candidate of every text Structured features collection;Selection unit, for therein effectively by calculating information selection of the structured features in different texts Character subset;Obtaining unit, for constructing all structured features of observation text, by calculating between structured features Similarity, while obtaining the feature vector of every text topic of description;Topic cluster cell, for based in obtaining unit processing The feature vector of acquisition carries out topic cluster and obtains topic category set;Construction unit, in conjunction with the environmentally friendly event Library constructs all when and where features of observation text, carries out when and where reasoning respectively, when being every text construction Between feature vector and Site characterization vector;Time and location cluster cell, feature for being obtained in being handled based on construction unit to Amount carries out time and location cluster and obtains time and location category set;Integrated unit, for by topic category set and temporally Point category set is merged, and obtains the category set of final environmental protection event;Prewarning unit, for according to each cluster The text number sequence early warning degree for being included, and the environmental protection event progress real-time early warning that will be more than given threshold value.
8. the compositing system according to claim 7 based on artificial intelligence, which is characterized in that the taxon is also wrapped It includes: extracting subelement, it is that information extraction when for being handled according to conversion unit obtains as a result, being directed to environmental protection event text The characteristics of, combining environmental protects the environmentally friendly knowledge base of event, carries out extensive knowledge, feature extraction and characteristic value meter to text It calculates;Classification subelement, the Deep Semantics character representation for obtaining according to subelement is extracted mark training set training two using having Disaggregated model carries out real-time grading, final output recognition result according to disaggregated model.
9. a kind of computer equipment, which is characterized in that including memory and processor, being stored with computer in the memory can Reading instruction, when the computer-readable instruction is executed by the processor, so that the processor executes such as claim 1 to 5 Any one of the method the step of.
10. a kind of storage medium for being stored with computer-readable instruction, which is characterized in that the computer-readable instruction is by one Or multiple processors are when executing, so that one or more processors execute the step such as any one of claims 1 to 5 the method Suddenly.
CN201811020177.2A 2018-09-03 2018-09-03 The analysis of public opinion method, system, equipment and storage medium Withdrawn CN109408804A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811020177.2A CN109408804A (en) 2018-09-03 2018-09-03 The analysis of public opinion method, system, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811020177.2A CN109408804A (en) 2018-09-03 2018-09-03 The analysis of public opinion method, system, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN109408804A true CN109408804A (en) 2019-03-01

Family

ID=65463909

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811020177.2A Withdrawn CN109408804A (en) 2018-09-03 2018-09-03 The analysis of public opinion method, system, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN109408804A (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110347793A (en) * 2019-06-28 2019-10-18 北京牡丹电子集团有限责任公司宁安智慧工程中心 A kind of semantic analysis method and device of Chinese
CN110377696A (en) * 2019-06-19 2019-10-25 新华智云科技有限公司 A kind of commodity future news the analysis of public opinion method and system
CN110825839A (en) * 2019-11-07 2020-02-21 成都国腾实业集团有限公司 Incidence relation analysis method for targets in text information
CN110851598A (en) * 2019-10-30 2020-02-28 深圳价值在线信息科技股份有限公司 Text classification method and device, terminal equipment and storage medium
CN111581982A (en) * 2020-05-06 2020-08-25 首都师范大学 Ontology-based prediction method for public opinion early warning grade of medical dispute case
CN111914087A (en) * 2020-07-30 2020-11-10 广州城市信息研究所有限公司 Public opinion analysis method
CN111984765A (en) * 2019-05-21 2020-11-24 南京大学 Knowledge base question-answering process relation detection method and device
CN112100374A (en) * 2020-08-28 2020-12-18 清华大学 Text clustering method and device, electronic equipment and storage medium
CN112749269A (en) * 2019-10-31 2021-05-04 北京国双科技有限公司 Entity public opinion calculation method and system
CN112766506A (en) * 2021-01-19 2021-05-07 澜途集思生态科技集团有限公司 Knowledge base construction method based on architecture

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100312769A1 (en) * 2009-06-09 2010-12-09 Bailey Edward J Methods, apparatus and software for analyzing the content of micro-blog messages
CN103699663A (en) * 2013-12-27 2014-04-02 中国科学院自动化研究所 Hot event mining method based on large-scale knowledge base
CN104091054A (en) * 2014-06-26 2014-10-08 中国科学院自动化研究所 Mass disturbance warning method and system applied to short texts

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100312769A1 (en) * 2009-06-09 2010-12-09 Bailey Edward J Methods, apparatus and software for analyzing the content of micro-blog messages
CN103699663A (en) * 2013-12-27 2014-04-02 中国科学院自动化研究所 Hot event mining method based on large-scale knowledge base
CN104091054A (en) * 2014-06-26 2014-10-08 中国科学院自动化研究所 Mass disturbance warning method and system applied to short texts

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111984765A (en) * 2019-05-21 2020-11-24 南京大学 Knowledge base question-answering process relation detection method and device
CN111984765B (en) * 2019-05-21 2023-10-24 南京大学 Knowledge base question-answering process relation detection method and device
CN110377696A (en) * 2019-06-19 2019-10-25 新华智云科技有限公司 A kind of commodity future news the analysis of public opinion method and system
CN110347793A (en) * 2019-06-28 2019-10-18 北京牡丹电子集团有限责任公司宁安智慧工程中心 A kind of semantic analysis method and device of Chinese
CN110851598A (en) * 2019-10-30 2020-02-28 深圳价值在线信息科技股份有限公司 Text classification method and device, terminal equipment and storage medium
CN110851598B (en) * 2019-10-30 2023-04-07 深圳价值在线信息科技股份有限公司 Text classification method and device, terminal equipment and storage medium
CN112749269A (en) * 2019-10-31 2021-05-04 北京国双科技有限公司 Entity public opinion calculation method and system
CN110825839B (en) * 2019-11-07 2023-07-21 成都国腾实业集团有限公司 Association relation analysis method for targets in text information
CN110825839A (en) * 2019-11-07 2020-02-21 成都国腾实业集团有限公司 Incidence relation analysis method for targets in text information
CN111581982A (en) * 2020-05-06 2020-08-25 首都师范大学 Ontology-based prediction method for public opinion early warning grade of medical dispute case
CN111581982B (en) * 2020-05-06 2023-02-17 首都师范大学 Ontology-based prediction method for public opinion early warning grade of medical dispute case
CN111914087A (en) * 2020-07-30 2020-11-10 广州城市信息研究所有限公司 Public opinion analysis method
CN111914087B (en) * 2020-07-30 2023-09-19 广州城市信息研究所有限公司 Public opinion analysis method
CN112100374A (en) * 2020-08-28 2020-12-18 清华大学 Text clustering method and device, electronic equipment and storage medium
CN112766506A (en) * 2021-01-19 2021-05-07 澜途集思生态科技集团有限公司 Knowledge base construction method based on architecture

Similar Documents

Publication Publication Date Title
CN109408804A (en) The analysis of public opinion method, system, equipment and storage medium
CN104091054B (en) Towards the Mass disturbance method for early warning and system of short text
CN110968699B (en) Logic map construction and early warning method and device based on fact recommendation
CN112199608B (en) Social media rumor detection method based on network information propagation graph modeling
CN110888943B (en) Method and system for assisted generation of court judge document based on micro-template
CN110781315B (en) Food safety knowledge graph and construction method of related intelligent question-answering system
CN106294619A (en) Public sentiment intelligent supervision method
CN103970733B (en) A kind of Chinese new word identification method based on graph structure
Zhou et al. Learning household task knowledge from WikiHow descriptions
CN113204967B (en) Resume named entity identification method and system
CN108959395A (en) A kind of level towards multi-source heterogeneous big data about subtracts combined cleaning method
CN111427775A (en) Method level defect positioning method based on Bert model
CN112394973A (en) Multi-language code plagiarism detection method based on pseudo-twin network
CN103678499A (en) Data mining method based on multi-source heterogeneous patent data semantic integration
CN115757775A (en) Text implication-based triggerless text event detection method and system
Guo et al. Text quality analysis of emergency response plans
Li et al. Neural factoid geospatial question answering
CN117743601B (en) Natural resource knowledge graph completion method, device, equipment and medium
Sheeren et al. A data‐mining approach for assessing consistency between multiple representations in spatial databases
CN114066077B (en) Environmental sanitation risk prediction method based on emergency event space warning sign analysis
Thanos et al. Combined deep learning and traditional NLP approaches for fire burst detection based on twitter posts
CN115391523A (en) Wind power plant multi-source heterogeneous data processing method and device
CN111797213A (en) Method for mining financial risk clues from unstructured network information
CN111291198A (en) Economic situation index analysis method and system based on big data and computer readable medium
Delavallade et al. Monitoring event flows and modelling scenarios for crisis prediction: Application to ethnic conflicts forecasting

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20190301