Embodiment
It referring to Figure 1, is the network public sentiment information processing method application scenarios schematic diagram of this specification embodiment.Tissue
Mechanism in the process of running, can encounter some emergency events related with operation, such as capital chain rupture, fund run on a bank, finance
Product loss, senior executive departing etc. can generate a large amount of network public-opinion for these emergency events on the internet.Server 100
In network public sentiment information processing unit 101 for being monitored to the network public-opinion of the organization in internet.
In a first aspect, this specification embodiment provides a kind of network public sentiment information processing method, referring to FIG. 2, including step
Rapid S201~S204.
S201 acquires the web document in internet in real time.
Web crawlers is the program or script for automatically extracting webpage, it is downloaded from WWW for search engine
Web document is the important composition of search engine.In a kind of optional way in the present embodiment, using web crawlers to master
The web document of flow network media site is crawled in real time.The mainstream network media site includes but is not limited to microblogging, micro-
Believe, know, top news, discussion bar etc., the web document includes but is not limited to news web page document, forum Web pages document and rich
Objective web document etc..
S202 determines initial public sentiment associated with default organization according to the web document acquired in preset period of time
Information.
The preset period of time can divide as follows:, can be by time discrete since time attribute is successive value
Change, i.e., will be segmented the time.For example, being segmented according to natural day, the preset period of time can be one day or several days;According to
Calendar month segmentation, the preset period of time can be one month or some months.The preset period of time can as the case may be and
Fixed, this specification embodiment is not construed as limiting this.For example, under an application scenarios, according to the webpage text acquired in three days
Shelves, determining initial public feelings information associated with the default organization.It, can by adjusting the length of the preset period of time
To control the timeliness of monitoring.The default organization is the organization of user's concern, and quantity can be one,
It can be multiple.
This specification embodiment provides the concrete methods of realizing of step S202 a kind of, referring to FIG. 3, including step S301
With step S302.
S301 judges whether the web document includes element in identity set, in the identity set
Element be the default organization identity.
In the preset period of time, web crawlers can crawl a large amount of web document.In these web documents, only
There is part web document associated with the default organization.In the present embodiment, pass through the body of the default organization
The determining web document associated with the default organization of part mark.Specifically, for acquiring in the preset period of time
Whether each web document searches in the web document comprising the element in the identity set, that is, searches the net
Page document in whether include the default organization identity.The identity of the default organization can be
In the ProductName of the full name of the default organization, the alias of the default organization and the default organization
One or any several combinations.For example, by taking the default organization is industrial and commercial bank as an example, complete entitled " Chinese work
Limited liability company of quotient bank ", alias are " Industrial and Commercial Bank of China ", " industrial and commercial bank ", " industrial and commercial bank " or " ICBC ", are produced
The name of an article is " work silver have quick access to information cable release ", " the steady benefit of wealth 189 days " or " melting e to borrow ", then the identity set can be in
Limited liability company of industrial and commercial bank of state, the Industrial and Commercial Bank of China, industrial and commercial bank, industrial and commercial bank, ICBC, work silver are had quick access to information cable release, and wealth is surely sharp
189 days, melt e and borrow }.
Element in the identity set is determined according to the organization that user pays close attention to.Further, described real-time
Before acquiring the web document in internet, the identity of the default organization of user's input can receive, and will
The identity of the default organization is saved into the identity data library pre-established, can be obtained the identity
Logo collection.
S302, if the web document includes the element in the identity set, it is determined that the web document is
The initial public feelings information.
If the web document include the identity set in element, show the web document with it is described pre-
If organization is associated, so that it is determined that the web document is that initial public sentiment associated with the default organization is believed
The web document is referred to default organization accordingly by breath.
Since the collected web document of web crawlers has the fabric of oneself, html language is used, thus is contained
There are a large amount of noise data, such as advertisement, navigation information, picture, copyright notice and link etc..This makes to judge the webpage
Whether document includes that element in the identity set becomes complicated.Based on this, this specification embodiment provides another
The concrete methods of realizing of step S202, referring to FIG. 4, including step S401~S403.
S401 is filtered the web document, obtains effective document.
The web document is filtered, is to filter out the noise data of the web document, for example advertisement, lead
Boat information, picture, copyright notice and link etc..It in the present embodiment, can be by the way of text extraction to the webpage
Document is filtered, and the mode that the text extracts includes but is not limited to that the text based on file structure extracts, based on abstract
Text extracts, the text based on link extracts or the text based on adjacent webpage extracts.The web document is filtered
Afterwards, effective document of acquisition may include network address, title, time, author, source, text, comment, amount of reading and reply number
Etc. data.
S402 judges whether effective document includes element in identity set, in the identity set
Element be the default organization identity.
S403, if effective document includes the element in the identity set, it is determined that effective document is
The initial public feelings information.
The specific implementation of step S402 and step S403 can refer to the description to step S301 and step S302,
This is repeated no more.
S203 is filtered the initial public feelings information, obtains effective public feelings information.
Although some initial public feelings informations contain the identity of the default organization, it is not one
Effective public feelings information, it may be possible to which certain network promotion information are in order to improve itself correlation, and to the default organization
Identity has carried out malice and has added, thus can provide a unified strobe utility to all initial public feelings informations.Base
In this, this specification embodiment provides the concrete methods of realizing of step S203 a kind of, referring to FIG. 5, including step S501~
S503。
S501, determines global exclusion set of words, and the global element excluded in set of words is the pass of network promotion information
Keyword.
The content of the network promotion information is different with time change, thus the global exclusion set of words
In element can pre-set, and constantly updated in the subsequent operation of system.The global member excluded in set of words
Element includes but is not limited to the relative words such as " wholesale price " and " search notice ".
S502 judges whether the initial public feelings information includes the global element excluded in set of words.
For the initial public feelings information in each of the public sentiment data library, searches and whether wrapped in the initial public feelings information
Containing the global element excluded in set of words.
S503, if the initial public feelings information does not include the global element excluded in set of words, it is determined that described first
Beginning public feelings information is effective public feelings information.
If the initial public feelings information does not include the global element excluded in set of words, show the initial public sentiment
Information is not network promotion information, so that it is determined that the initial public feelings information is effective public feelings information.
By executing step S501~S503, most of common invalid public feelings information can be filtered out, but still can
Having some invalid public feelings informations, there is no filtered.For example, network public-opinion relevant to Taobao is more biased towards in shopping, electric business, fortune
Battalion etc., and take relevant network public-opinion to ant gold and be more biased towards in finance, insurance, fund even multiple level marketing, thus need to be directed to
Specific exclusion set of words is separately provided in the default organization of difference.Based on this, this specification embodiment provides another step
The concrete methods of realizing of S203, referring to FIG. 6, including step S601~S603.
S601, determining part corresponding with each default organization exclude set of words, in the local exclusion set of words
Element be keyword relevant to default organization's business.
Default organization's business is different with time change, thus the part excludes in set of words
Element can pre-set, and constantly updated in the subsequent operation of system.The default organization and the part
It excludes set of words to correspond, i.e., each default organization is corresponding determines that a part excludes set of words.
S602, judge the initial public feelings information whether include and the initial associated preset group loom of public feelings information
The corresponding part of structure excludes the element in set of words.
For the initial public feelings information in each of the public sentiment data library, it is first determined closed with the initial public feelings information
The default organization of connection, then determine the corresponding part row of the default organization associated with the initial public feelings information
Except set of words, finally search in the initial public feelings information whether include and the initial associated default tissue of public feelings information
The corresponding part of mechanism excludes the element in set of words.
S603, if the initial public feelings information does not include the element in the part exclusion set of words, it is determined that described first
Beginning public feelings information is effective public feelings information.
If the initial public feelings information does not include the element in the part exclusion set of words, show the initial public sentiment
Information is not invalid public feelings information, so that it is determined that the initial public feelings information is effective public feelings information.
By executing step S501~S503 respectively and executing step S601~S603, a part can be filtered out respectively
Invalid public feelings information.In order to which public sentiment monitoring is made more accurate, the two can be combined.Based on this, this specification is real
It applies example and the concrete methods of realizing of another step S203 is provided, referring to FIG. 7, including step S701~S704.
S701 determines global exclusion set of words and local exclusion set of words corresponding with each default organization, institute
The keyword that the global element excluded in set of words is network promotion information is stated, the element that the part excludes in set of words is
Keyword relevant to default organization's business.
S702 judges whether the initial public feelings information includes the global element excluded in set of words.
S703 judges described first if the initial public feelings information does not include the global element excluded in set of words
Whether beginning public feelings information includes that part corresponding with the associated default organization of the initial public feelings information excludes in set of words
Element.
S704, if the initial public feelings information does not include the element in the part exclusion set of words, it is determined that described first
Beginning public feelings information is effective public feelings information.
The specific implementation of step S701~S704 can refer to step S501~S503, step S601~S603
Description, details are not described herein.
S204 sorts out effective public feelings information.
The purpose sorted out to effective public feelings information is, effective public feelings information with feature of the same race is drawn
It is divided into same class.Traditional text classification method, which is that high latitude height is sparse, feature representation ability is very weak there are text representation, asks
Topic, and neural network is bad to handle such data very much.In addition it is also necessary to manually carry out Feature Engineering, cost is very
It is high.Based on this, this specification embodiment provides the concrete methods of realizing of step S204 a kind of, referring to FIG. 8, including step
S801 and step S802.
S801 carries out document vectorization to the effective public feelings information, obtain the corresponding word of effective public feelings information to
Amount.
Carrying out document vectorization to effective public feelings information is to reach each vocabulary in effective public feelings information
N-dimensional is dense, continuous real vector, one-hot coding (One-Hot Encoding) vector space on the other side only one
Dimension is 1, remaining is all 0.Various document vectorization algorithms specifically can be used to carry out document vectorization, such as TF (word frequency) is calculated
Method, TF-IDF (word frequency-frequency inverse) algorithm etc., each corresponding term vector of effectively public feelings information.Effective public feelings information
Expression similar figure is become by the high sparse intractable mode of neural network of high latitude by the representation of vectorization
The continuous dense data of picture, voice, i.e., the described term vector.
S802 divides the corresponding term vector of all effective public feelings informations using the machine learning model pre-established
Analysis, determines public sentiment classification belonging to all effective public feelings informations.
After obtaining the corresponding term vector of all effective public feelings informations, all term vectors are imported into the machine learning model,
The machine learning model analyzes all term vectors, will be provided with effective public feelings information stroke of feature of the same race by analyzing
It is divided into same class.The machine learning model can be CNN model, RNN model etc., utilize the networks such as CNN model, RNN model
Structure has automatic acquisition feature representation ability, can remove many and diverse manual features engineering, solve the problems, such as end-to-endly.Into
One step, at machine learning model training initial stage, since sample size is fewer, classification accuracy is comparatively relatively low,
Thus the accuracy rate that newest sample carrys out sophisticated model classification can be constantly provided in system operation.
In step S802, the public sentiment classification can not be defined, i.e., only all effective public feelings informations be carried out
Sort out, all effective public feelings informations for having feature of the same race are divided into same class, pay close attention to the public sentiment classification without going.When
So, the public sentiment classification can also be pre-defined, for example, the public sentiment classification can be " capital chain rupture ", " fund is squeezed
Convert ", " financial product loss ", " senior executive departing " etc..When having pre-defined the public sentiment classification, if effective public sentiment letter
Breath meets certain conditions of the public sentiment classification, then it is considered that effective public feelings information belongs to the public sentiment classification, and
It no longer needs to carry out machine learning.Based on this, this specification embodiment provides the concrete methods of realizing of step S204 a kind of, asks
With reference to Fig. 9, including step S901~S904.
Public sentiment classification and the corresponding characteristic condition of each public sentiment classification is arranged in S901.
The public sentiment classification can be event summary relevant to default organization's credit risk, as previously mentioned,
The public sentiment classification can be " capital chain rupture ", " fund runs on a bank ", " financial product loss ", " senior executive departing " etc..The spy
The feature that sign condition is had by the public sentiment classification, by taking the public sentiment classification is " senior executive departing " as an example, the characteristic condition
It can be set comprising the keywords such as " CEO leaving office ", " CTO leaving office ", can specifically be configured according to actual needs.
S902, judges whether effective public feelings information meets the characteristic condition.
As a specific embodiment, whether the characteristic condition can be included by searching in effective public feelings information
The keyword of setting, to judge whether effective public feelings information meets the characteristic condition.Such as, it can be determined that it is described to have
Imitate whether public feelings information includes any keywords such as " CEO leaving office ", " CTO leaving office ".
Effective public feelings information is referred to institute if effective public feelings information meets the characteristic condition by S903
The corresponding public sentiment classification of characteristic condition is stated, document vectorization otherwise is carried out to effective public feelings information, to have described in obtaining
Imitate the corresponding term vector of public feelings information.
Still by the public sentiment classification be " senior executive departing " for, if effective public feelings information include " CEO leaving office ",
Any keywords such as " CTO leaving office ", then be divided into " senior executive departing " classification for effective public feelings information, otherwise have to described
It imitates public feelings information and carries out document vectorization.
S904 believes all effective public sentiments for being unsatisfactory for the characteristic condition using the machine learning model pre-established
It ceases corresponding term vector to be analyzed, determines and all be unsatisfactory for public sentiment class belonging to effective public feelings information of the characteristic condition
Not.
Step S904 can refer to the description to step S802, and details are not described herein.
Further, after sorting out to effective public feelings information, can also have to belonging to described in same event
Effect public feelings information is normalized.After an emergency event occurs for the default organization, for the emergency event pair
There should be a large amount of effective public feelings information, i.e., each public sentiment classification is corresponding with a large amount of effective public feelings information.In this case it uses
Public sentiment classification is checked at family, is required a great deal of time.It therefore, can be by way of normalized to same event
All effective public feelings informations merge.
Further, the web document acquired according to preset period of time determines associated with default organization initial
After public feelings information, content analysis, such as emotion point can also be carried out to the initial public feelings information using natural language processing
Analysis, keyword relevancies analysis and sensitive dimensional analysis etc..It, can by carrying out content analysis to the initial public feelings information
To obtain the corresponding emotion score of each public feelings information, determine that public feelings information is that positive public sentiment or negative public sentiment mention for user
For reference.
In this specification embodiment, by acquiring the web document in internet in real time, according to what is acquired in preset period of time
Web document can determine initial public feelings information associated with default organization, to be user to the default tissue
The financial analysis and prediction of mechanism provide foundation.Also, by being filtered to the initial public feelings information, nothing can be excluded
Public feelings information is imitated, to improve the accuracy of the network public-opinion monitoring to the default organization.Further, by mistake
The effective public feelings information obtained after filter is sorted out, and the credit risk for influencing the default organization can be quickly obtained
Public feelings information helps quickly to find credit risk existing for the default organization.
Second aspect, based on the same inventive concept, this specification embodiment provide a kind of network public sentiment information processing dress
It sets, referring to FIG. 10, including:
Acquisition module 1001, for acquiring the web document in internet in real time;
Determining module 1002, for according to the web document acquired in preset period of time, determination to be related to default organization
The initial public feelings information of connection;
Filtering module 1003 obtains effective public feelings information for being filtered to the initial public feelings information;
Classifying module 1004 sorts out effective public feelings information.
In a kind of optional implementation, the determining module includes:
First judging unit, for judging whether the web document includes element in identity set, the body
Element in part logo collection is the identity of the default organization;
First determination unit, for determining institute when the web document includes the element in the identity set
Stating web document is the initial public feelings information.
In a kind of optional implementation, the determining module includes:
Filter element obtains effective document for being filtered to the web document;
Second judgment unit, for judging whether effective document includes element in identity set, the body
Element in part logo collection is the identity of the default organization;
Second determination unit, for determining institute when effective document includes the element in the identity set
Stating effective document is the initial public feelings information.
In a kind of optional implementation, described device further include:
Receiving module 1005, the identity of the default organization for receiving user's input, to obtain institute
State identity set.
In a kind of optional implementation, the filtering module includes:
Third determination unit, for determining global exclusion set of words, the global element excluded in set of words is network
The keyword of promotion message;
Third judging unit, for judging whether the initial public feelings information includes in the global exclusion set of words
Element;
4th determination unit, for not including the global element excluded in set of words in the initial public feelings information
When, determine that the initial public feelings information is effective public feelings information.
In a kind of optional implementation, the filtering module includes:
5th determination unit, for determining local exclusion set of words corresponding with each default organization, the part
Excluding the element in set of words is keyword relevant to default organization's business;
4th judging unit, for judging whether the initial public feelings information includes to be associated with the initial public feelings information
Default organization it is corresponding part exclude set of words in element;
6th determination unit, for not including the element in the part exclusion set of words in the initial public feelings information
When, determine that the initial public feelings information is effective public feelings information.
In a kind of optional implementation, the filtering module includes:
7th determination unit global excludes set of words and part corresponding with each default organization for determining
Set of words is excluded, the global element excluded in set of words is the keyword of network promotion information, and the part excludes word set
Element in conjunction is keyword relevant to default organization's business;
5th judging unit, for judging whether the initial public feelings information includes in the global exclusion set of words
Element;
6th judging unit, for not including the global element excluded in set of words in the initial public feelings information
When, judge whether the initial public feelings information includes office corresponding with the associated default organization of the initial public feelings information
Portion excludes the element in set of words;
8th determination unit, when not including the element in the part exclusion set of words for the initial public feelings information,
Determine that the initial public feelings information is effective public feelings information.
In a kind of optional implementation, the classifying module includes:
First document vectorization unit obtains described effective for carrying out document vectorization to effective public feelings information
The corresponding term vector of public feelings information;
9th determination unit, it is corresponding to all effective public feelings informations for using the machine learning model pre-established
Term vector is analyzed, and determines public sentiment classification belonging to all effective public feelings informations.
In a kind of optional implementation, the classifying module includes:
Setting unit, for public sentiment classification and the corresponding characteristic condition of each public sentiment classification to be arranged;
7th judging unit, for judging whether effective public feelings information meets the characteristic condition;
Sort out unit, for when effective public feelings information meets the characteristic condition, by effective public feelings information
It is referred to the corresponding public sentiment classification of the characteristic condition,
Second document vectorization unit, for when effective public feelings information is unsatisfactory for the characteristic condition, to described
Effective public feelings information carries out document vectorization, to obtain the corresponding term vector of effective public feelings information;
Tenth determination unit, for being unsatisfactory for the characteristic condition to all using the machine learning model pre-established
The corresponding term vector of effective public feelings information analyzed, determine all effective public feelings informations for being unsatisfactory for the characteristic condition
Affiliated public sentiment classification.
In a kind of optional implementation, described device further include:
Module 1006 is normalized, for the effective public feelings information for belonging to same event to be normalized.
In a kind of optional implementation, described device further include:
Content analysis module 1007, for carrying out content analysis to the initial public feelings information using natural language processing.
The third aspect is based on inventive concept same as network public sentiment information processing method in previous embodiment, the present invention
A kind of server is also provided, as shown in figure 11.The server includes memory 1104, processor 1102 and is stored in storage
On device 1104 and the computer program that can run on processor 1102, the processor 1102 are realized when executing described program
The step of either network public sentiment information processing method described previously method.
Wherein, in Figure 11, bus architecture (is represented) with bus 1100, and bus 1100 may include any number of
The bus and bridge of interconnection, bus 1100 will include the one or more processors represented by processor 1102 and memory 1104
The various circuits of the memory of representative link together.Bus 1100 can also will such as peripheral equipment, voltage-stablizer and power tube
Various other circuits of reason circuit or the like link together, and these are all it is known in the art, therefore, no longer right herein
It is described further.Bus interface 1106 provides interface between bus 1100 and receiver 1101 and transmitter 1103.
Receiver 1101 and transmitter 1103 can be the same element, i.e. transceiver, provide for over a transmission medium with it is various its
The unit of his device communication.Processor 1102 is responsible for management bus 1100 and common processing, and memory 1104 can be by
For storage processor 1102 when executing operation used data.
Fourth aspect, based on the inventive concept with network public sentiment information processing method in previous embodiment, the present invention is also mentioned
For a kind of computer readable storage medium, it is stored thereon with computer program, institute above is realized when which is executed by processor
The step of stating either network public sentiment information processing method method.
This specification is referring to the method, equipment (system) and computer program product according to this specification embodiment
Flowchart and/or the block diagram describes.It should be understood that can be realized by computer program instructions in flowchart and/or the block diagram
The combination of process and/or box in each flow and/or block and flowchart and/or the block diagram.It can provide these meters
Calculation machine program instruction is to the place of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices
Device is managed to generate a machine, so that the instruction executed by computer or the processor of other programmable data processing devices
It generates to specify in one or more flows of the flowchart and/or one or more blocks of the block diagram
The equipment of function.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy
Determine in the computer-readable memory that mode works, so that instruction stored in the computer readable memory generation includes
The manufacture of commander equipment, the commander equipment are realized in one box of one or more flows of the flowchart and/or block diagram
Or the function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that
Series of operation steps are executed on computer or other programmable devices to generate computer implemented processing, thus calculating
The instruction executed on machine or other programmable devices is provided for realizing in one or more flows of the flowchart and/or side
The step of function of being specified in block diagram one box or multiple boxes.
Although the preferred embodiment of this specification has been described, once a person skilled in the art knows basic
Creative concept, then additional changes and modifications may be made to these embodiments.So the following claims are intended to be interpreted as
Including preferred embodiment and fall into all change and modification of this specification range.
Obviously, those skilled in the art can carry out various modification and variations without departing from this explanation to this specification
The spirit and scope of book.In this way, if these modifications and variations of this specification belong to this specification claim and its are equal
Within the scope of technology, then this specification is also intended to include these modifications and variations.