CN109213929A - Network public sentiment information processing method, device and server - Google Patents

Network public sentiment information processing method, device and server Download PDF

Info

Publication number
CN109213929A
CN109213929A CN201810832965.5A CN201810832965A CN109213929A CN 109213929 A CN109213929 A CN 109213929A CN 201810832965 A CN201810832965 A CN 201810832965A CN 109213929 A CN109213929 A CN 109213929A
Authority
CN
China
Prior art keywords
feelings information
public feelings
effective
words
initial
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810832965.5A
Other languages
Chinese (zh)
Inventor
蒋士正
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Advanced New Technologies Co Ltd
Advantageous New Technologies Co Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201810832965.5A priority Critical patent/CN109213929A/en
Publication of CN109213929A publication Critical patent/CN109213929A/en
Pending legal-status Critical Current

Links

Abstract

This specification embodiment provides a kind of network public sentiment information processing method, by acquiring the web document in internet in real time, initial public feelings information associated with default organization can be determined according to the web document acquired in preset period of time, to provide foundation to the financial analysis and prediction of default organization for user.

Description

Network public sentiment information processing method, device and server
Technical field
The present invention relates to Internet technical fields, and in particular to a kind of network public sentiment information processing method, device and service Device.
Background technique
Network public-opinion refers in certain social space, and generation, the development of intermediary social event are surrounded by network And variation, society and politics attitude, conviction and the values that the common people generate and hold to common problem and social governor. Network public-opinion is formed rapidly, huge to social influence.With the rapid development of internet in the world, the network media is " fourth media " being acknowledged as after newspaper, broadcast, TV, network become reflection social public opinion main carriers it One.
Organization in the process of running, can encounter it is some with manage related emergency events, such as capital chain rupture, Fund runs on a bank, financial product loses, senior executive departing etc., these emergency events can generate a large amount of network public-opinion.For a group loom Structure carries out round-the-clock network public-opinion monitoring, helps quickly to find credit risk existing for organization.
Summary of the invention
This specification embodiment provides a kind of network public sentiment information processing method, device and server.
In a first aspect, this specification embodiment provides a kind of network public sentiment information processing method, comprising: acquisition is mutual in real time Web document in networking;According to the web document acquired in preset period of time, determine associated with default organization initial Public feelings information;The initial public feelings information is filtered, effective public feelings information is obtained;Effective public feelings information is carried out Sort out.
Second aspect, this specification embodiment provide a kind of network public sentiment information processing unit, comprising: acquisition module is used Web document in acquisition internet in real time;Determining module, for determining according to the web document acquired in preset period of time Initial public feelings information associated with default organization;Filtering module, for being filtered to the initial public feelings information, Obtain effective public feelings information;Classifying module sorts out effective public feelings information.
The third aspect, this specification embodiment provide a kind of server, including memory, processor and are stored in storage On device and the computer program that can run on a processor, the processor realize any of the above-described institute when executing described program The step of stating network public sentiment information processing method.
Fourth aspect, this specification embodiment provide a kind of computer readable storage medium, are stored thereon with computer journey Sequence, when which is executed by processor the step of network public sentiment information processing method described in realization any of the above-described.
This specification embodiment has the beneficial effect that:
In this specification embodiment, by acquiring the web document in internet in real time, according to what is acquired in preset period of time Web document can determine initial public feelings information associated with default organization, to be user to the default tissue The financial analysis and prediction of mechanism provide foundation.Also, by being filtered to the initial public feelings information, nothing can be excluded Public feelings information is imitated, to improve the accuracy of the network public-opinion monitoring to the default organization.Further, by mistake The effective public feelings information obtained after filter is sorted out, and the credit risk for influencing the default organization can be quickly obtained Public feelings information helps quickly to find credit risk existing for the default organization.
Detailed description of the invention
Attached drawing described herein is used to provide to further understand the embodiment of the present invention, constitutes one of the application Point, do not constitute the restriction to the embodiment of the present invention.In the accompanying drawings:
Fig. 1 is that this specification embodiment network public sentiment information handles application scenarios schematic diagram;
Fig. 2 is this specification embodiment first aspect network public sentiment information processing method flow chart;
Fig. 3 is the process that a kind of embodiment of this specification determines initial public feelings information associated with default organization Figure;
Fig. 4 is the stream that this specification another kind embodiment determines initial public feelings information associated with default organization Cheng Tu;
Fig. 5 is the flow chart that a kind of embodiment of this specification is filtered initial public feelings information;
Fig. 6 is the flow chart that this specification another kind embodiment is filtered initial public feelings information;
Fig. 7 is the flow chart that this specification another embodiment is filtered initial public feelings information;
Fig. 8 is the flow chart that a kind of embodiment of this specification sorts out effective public feelings information;
Fig. 9 is the flow chart that this specification another kind embodiment sorts out effective public feelings information;
Figure 10 is this specification embodiment second aspect network public sentiment information processing device structure diagram;
Figure 11 is this specification embodiment third aspect network public sentiment information processing server structural schematic diagram.
Specific embodiment
To make the objectives, technical solutions, and advantages of the present invention clearer, right below with reference to embodiment and attached drawing The present invention is described in further detail, exemplary embodiment of the invention and its explanation for explaining only the invention, not As limitation of the invention.
Embodiment
It referring to Figure 1, is the network public sentiment information processing method application scenarios schematic diagram of this specification embodiment.Tissue Mechanism in the process of running, can encounter some emergency events related with operation, such as capital chain rupture, fund run on a bank, finance Product loss, senior executive departing etc. can generate a large amount of network public-opinion for these emergency events on the internet.Server 100 In network public sentiment information processing unit 101 for being monitored to the network public-opinion of the organization in internet.
In a first aspect, this specification embodiment provides a kind of network public sentiment information processing method, referring to FIG. 2, including step Rapid S201~S204.
S201 acquires the web document in internet in real time.
Web crawlers is the program or script for automatically extracting webpage, it is downloaded from WWW for search engine Web document is the important composition of search engine.In a kind of optional way in the present embodiment, using web crawlers to master The web document of flow network media site is crawled in real time.The mainstream network media site includes but is not limited to microblogging, micro- Believe, know, top news, discussion bar etc., the web document includes but is not limited to news web page document, forum Web pages document and rich Objective web document etc..
S202 determines initial public sentiment associated with default organization according to the web document acquired in preset period of time Information.
The preset period of time can divide as follows:, can be by time discrete since time attribute is successive value Change, i.e., will be segmented the time.For example, being segmented according to natural day, the preset period of time can be one day or several days;According to Calendar month segmentation, the preset period of time can be one month or some months.The preset period of time can as the case may be and Fixed, this specification embodiment is not construed as limiting this.For example, under an application scenarios, according to the webpage text acquired in three days Shelves, determining initial public feelings information associated with the default organization.It, can by adjusting the length of the preset period of time To control the timeliness of monitoring.The default organization is the organization of user's concern, and quantity can be one, It can be multiple.
This specification embodiment provides the concrete methods of realizing of step S202 a kind of, referring to FIG. 3, including step S301 With step S302.
S301 judges whether the web document includes element in identity set, in the identity set Element be the default organization identity.
In the preset period of time, web crawlers can crawl a large amount of web document.In these web documents, only There is part web document associated with the default organization.In the present embodiment, pass through the body of the default organization The determining web document associated with the default organization of part mark.Specifically, for acquiring in the preset period of time Whether each web document searches in the web document comprising the element in the identity set, that is, searches the net Page document in whether include the default organization identity.The identity of the default organization can be In the ProductName of the full name of the default organization, the alias of the default organization and the default organization One or any several combinations.For example, by taking the default organization is industrial and commercial bank as an example, complete entitled " Chinese work Limited liability company of quotient bank ", alias are " Industrial and Commercial Bank of China ", " industrial and commercial bank ", " industrial and commercial bank " or " ICBC ", are produced The name of an article is " work silver have quick access to information cable release ", " the steady benefit of wealth 189 days " or " melting e to borrow ", then the identity set can be in Limited liability company of industrial and commercial bank of state, the Industrial and Commercial Bank of China, industrial and commercial bank, industrial and commercial bank, ICBC, work silver are had quick access to information cable release, and wealth is surely sharp 189 days, melt e and borrow }.
Element in the identity set is determined according to the organization that user pays close attention to.Further, described real-time Before acquiring the web document in internet, the identity of the default organization of user's input can receive, and will The identity of the default organization is saved into the identity data library pre-established, can be obtained the identity Logo collection.
S302, if the web document includes the element in the identity set, it is determined that the web document is The initial public feelings information.
If the web document include the identity set in element, show the web document with it is described pre- If organization is associated, so that it is determined that the web document is that initial public sentiment associated with the default organization is believed The web document is referred to default organization accordingly by breath.
Since the collected web document of web crawlers has the fabric of oneself, html language is used, thus is contained There are a large amount of noise data, such as advertisement, navigation information, picture, copyright notice and link etc..This makes to judge the webpage Whether document includes that element in the identity set becomes complicated.Based on this, this specification embodiment provides another The concrete methods of realizing of step S202, referring to FIG. 4, including step S401~S403.
S401 is filtered the web document, obtains effective document.
The web document is filtered, is to filter out the noise data of the web document, for example advertisement, lead Boat information, picture, copyright notice and link etc..It in the present embodiment, can be by the way of text extraction to the webpage Document is filtered, and the mode that the text extracts includes but is not limited to that the text based on file structure extracts, based on abstract Text extracts, the text based on link extracts or the text based on adjacent webpage extracts.The web document is filtered Afterwards, effective document of acquisition may include network address, title, time, author, source, text, comment, amount of reading and reply number Etc. data.
S402 judges whether effective document includes element in identity set, in the identity set Element be the default organization identity.
S403, if effective document includes the element in the identity set, it is determined that effective document is The initial public feelings information.
The specific implementation of step S402 and step S403 can refer to the description to step S301 and step S302, This is repeated no more.
S203 is filtered the initial public feelings information, obtains effective public feelings information.
Although some initial public feelings informations contain the identity of the default organization, it is not one Effective public feelings information, it may be possible to which certain network promotion information are in order to improve itself correlation, and to the default organization Identity has carried out malice and has added, thus can provide a unified strobe utility to all initial public feelings informations.Base In this, this specification embodiment provides the concrete methods of realizing of step S203 a kind of, referring to FIG. 5, including step S501~ S503。
S501, determines global exclusion set of words, and the global element excluded in set of words is the pass of network promotion information Keyword.
The content of the network promotion information is different with time change, thus the global exclusion set of words In element can pre-set, and constantly updated in the subsequent operation of system.The global member excluded in set of words Element includes but is not limited to the relative words such as " wholesale price " and " search notice ".
S502 judges whether the initial public feelings information includes the global element excluded in set of words.
For the initial public feelings information in each of the public sentiment data library, searches and whether wrapped in the initial public feelings information Containing the global element excluded in set of words.
S503, if the initial public feelings information does not include the global element excluded in set of words, it is determined that described first Beginning public feelings information is effective public feelings information.
If the initial public feelings information does not include the global element excluded in set of words, show the initial public sentiment Information is not network promotion information, so that it is determined that the initial public feelings information is effective public feelings information.
By executing step S501~S503, most of common invalid public feelings information can be filtered out, but still can Having some invalid public feelings informations, there is no filtered.For example, network public-opinion relevant to Taobao is more biased towards in shopping, electric business, fortune Battalion etc., and take relevant network public-opinion to ant gold and be more biased towards in finance, insurance, fund even multiple level marketing, thus need to be directed to Specific exclusion set of words is separately provided in the default organization of difference.Based on this, this specification embodiment provides another step The concrete methods of realizing of S203, referring to FIG. 6, including step S601~S603.
S601, determining part corresponding with each default organization exclude set of words, in the local exclusion set of words Element be keyword relevant to default organization's business.
Default organization's business is different with time change, thus the part excludes in set of words Element can pre-set, and constantly updated in the subsequent operation of system.The default organization and the part It excludes set of words to correspond, i.e., each default organization is corresponding determines that a part excludes set of words.
S602, judge the initial public feelings information whether include and the initial associated preset group loom of public feelings information The corresponding part of structure excludes the element in set of words.
For the initial public feelings information in each of the public sentiment data library, it is first determined closed with the initial public feelings information The default organization of connection, then determine the corresponding part row of the default organization associated with the initial public feelings information Except set of words, finally search in the initial public feelings information whether include and the initial associated default tissue of public feelings information The corresponding part of mechanism excludes the element in set of words.
S603, if the initial public feelings information does not include the element in the part exclusion set of words, it is determined that described first Beginning public feelings information is effective public feelings information.
If the initial public feelings information does not include the element in the part exclusion set of words, show the initial public sentiment Information is not invalid public feelings information, so that it is determined that the initial public feelings information is effective public feelings information.
By executing step S501~S503 respectively and executing step S601~S603, a part can be filtered out respectively Invalid public feelings information.In order to which public sentiment monitoring is made more accurate, the two can be combined.Based on this, this specification is real It applies example and the concrete methods of realizing of another step S203 is provided, referring to FIG. 7, including step S701~S704.
S701 determines global exclusion set of words and local exclusion set of words corresponding with each default organization, institute The keyword that the global element excluded in set of words is network promotion information is stated, the element that the part excludes in set of words is Keyword relevant to default organization's business.
S702 judges whether the initial public feelings information includes the global element excluded in set of words.
S703 judges described first if the initial public feelings information does not include the global element excluded in set of words Whether beginning public feelings information includes that part corresponding with the associated default organization of the initial public feelings information excludes in set of words Element.
S704, if the initial public feelings information does not include the element in the part exclusion set of words, it is determined that described first Beginning public feelings information is effective public feelings information.
The specific implementation of step S701~S704 can refer to step S501~S503, step S601~S603 Description, details are not described herein.
S204 sorts out effective public feelings information.
The purpose sorted out to effective public feelings information is, effective public feelings information with feature of the same race is drawn It is divided into same class.Traditional text classification method, which is that high latitude height is sparse, feature representation ability is very weak there are text representation, asks Topic, and neural network is bad to handle such data very much.In addition it is also necessary to manually carry out Feature Engineering, cost is very It is high.Based on this, this specification embodiment provides the concrete methods of realizing of step S204 a kind of, referring to FIG. 8, including step S801 and step S802.
S801 carries out document vectorization to the effective public feelings information, obtain the corresponding word of effective public feelings information to Amount.
Carrying out document vectorization to effective public feelings information is to reach each vocabulary in effective public feelings information N-dimensional is dense, continuous real vector, one-hot coding (One-Hot Encoding) vector space on the other side only one Dimension is 1, remaining is all 0.Various document vectorization algorithms specifically can be used to carry out document vectorization, such as TF (word frequency) is calculated Method, TF-IDF (word frequency-frequency inverse) algorithm etc., each corresponding term vector of effectively public feelings information.Effective public feelings information Expression similar figure is become by the high sparse intractable mode of neural network of high latitude by the representation of vectorization The continuous dense data of picture, voice, i.e., the described term vector.
S802 divides the corresponding term vector of all effective public feelings informations using the machine learning model pre-established Analysis, determines public sentiment classification belonging to all effective public feelings informations.
After obtaining the corresponding term vector of all effective public feelings informations, all term vectors are imported into the machine learning model, The machine learning model analyzes all term vectors, will be provided with effective public feelings information stroke of feature of the same race by analyzing It is divided into same class.The machine learning model can be CNN model, RNN model etc., utilize the networks such as CNN model, RNN model Structure has automatic acquisition feature representation ability, can remove many and diverse manual features engineering, solve the problems, such as end-to-endly.Into One step, at machine learning model training initial stage, since sample size is fewer, classification accuracy is comparatively relatively low, Thus the accuracy rate that newest sample carrys out sophisticated model classification can be constantly provided in system operation.
In step S802, the public sentiment classification can not be defined, i.e., only all effective public feelings informations be carried out Sort out, all effective public feelings informations for having feature of the same race are divided into same class, pay close attention to the public sentiment classification without going.When So, the public sentiment classification can also be pre-defined, for example, the public sentiment classification can be " capital chain rupture ", " fund is squeezed Convert ", " financial product loss ", " senior executive departing " etc..When having pre-defined the public sentiment classification, if effective public sentiment letter Breath meets certain conditions of the public sentiment classification, then it is considered that effective public feelings information belongs to the public sentiment classification, and It no longer needs to carry out machine learning.Based on this, this specification embodiment provides the concrete methods of realizing of step S204 a kind of, asks With reference to Fig. 9, including step S901~S904.
Public sentiment classification and the corresponding characteristic condition of each public sentiment classification is arranged in S901.
The public sentiment classification can be event summary relevant to default organization's credit risk, as previously mentioned, The public sentiment classification can be " capital chain rupture ", " fund runs on a bank ", " financial product loss ", " senior executive departing " etc..The spy The feature that sign condition is had by the public sentiment classification, by taking the public sentiment classification is " senior executive departing " as an example, the characteristic condition It can be set comprising the keywords such as " CEO leaving office ", " CTO leaving office ", can specifically be configured according to actual needs.
S902, judges whether effective public feelings information meets the characteristic condition.
As a specific embodiment, whether the characteristic condition can be included by searching in effective public feelings information The keyword of setting, to judge whether effective public feelings information meets the characteristic condition.Such as, it can be determined that it is described to have Imitate whether public feelings information includes any keywords such as " CEO leaving office ", " CTO leaving office ".
Effective public feelings information is referred to institute if effective public feelings information meets the characteristic condition by S903 The corresponding public sentiment classification of characteristic condition is stated, document vectorization otherwise is carried out to effective public feelings information, to have described in obtaining Imitate the corresponding term vector of public feelings information.
Still by the public sentiment classification be " senior executive departing " for, if effective public feelings information include " CEO leaving office ", Any keywords such as " CTO leaving office ", then be divided into " senior executive departing " classification for effective public feelings information, otherwise have to described It imitates public feelings information and carries out document vectorization.
S904 believes all effective public sentiments for being unsatisfactory for the characteristic condition using the machine learning model pre-established It ceases corresponding term vector to be analyzed, determines and all be unsatisfactory for public sentiment class belonging to effective public feelings information of the characteristic condition Not.
Step S904 can refer to the description to step S802, and details are not described herein.
Further, after sorting out to effective public feelings information, can also have to belonging to described in same event Effect public feelings information is normalized.After an emergency event occurs for the default organization, for the emergency event pair There should be a large amount of effective public feelings information, i.e., each public sentiment classification is corresponding with a large amount of effective public feelings information.In this case it uses Public sentiment classification is checked at family, is required a great deal of time.It therefore, can be by way of normalized to same event All effective public feelings informations merge.
Further, the web document acquired according to preset period of time determines associated with default organization initial After public feelings information, content analysis, such as emotion point can also be carried out to the initial public feelings information using natural language processing Analysis, keyword relevancies analysis and sensitive dimensional analysis etc..It, can by carrying out content analysis to the initial public feelings information To obtain the corresponding emotion score of each public feelings information, determine that public feelings information is that positive public sentiment or negative public sentiment mention for user For reference.
In this specification embodiment, by acquiring the web document in internet in real time, according to what is acquired in preset period of time Web document can determine initial public feelings information associated with default organization, to be user to the default tissue The financial analysis and prediction of mechanism provide foundation.Also, by being filtered to the initial public feelings information, nothing can be excluded Public feelings information is imitated, to improve the accuracy of the network public-opinion monitoring to the default organization.Further, by mistake The effective public feelings information obtained after filter is sorted out, and the credit risk for influencing the default organization can be quickly obtained Public feelings information helps quickly to find credit risk existing for the default organization.
Second aspect, based on the same inventive concept, this specification embodiment provide a kind of network public sentiment information processing dress It sets, referring to FIG. 10, including:
Acquisition module 1001, for acquiring the web document in internet in real time;
Determining module 1002, for according to the web document acquired in preset period of time, determination to be related to default organization The initial public feelings information of connection;
Filtering module 1003 obtains effective public feelings information for being filtered to the initial public feelings information;
Classifying module 1004 sorts out effective public feelings information.
In a kind of optional implementation, the determining module includes:
First judging unit, for judging whether the web document includes element in identity set, the body Element in part logo collection is the identity of the default organization;
First determination unit, for determining institute when the web document includes the element in the identity set Stating web document is the initial public feelings information.
In a kind of optional implementation, the determining module includes:
Filter element obtains effective document for being filtered to the web document;
Second judgment unit, for judging whether effective document includes element in identity set, the body Element in part logo collection is the identity of the default organization;
Second determination unit, for determining institute when effective document includes the element in the identity set Stating effective document is the initial public feelings information.
In a kind of optional implementation, described device further include:
Receiving module 1005, the identity of the default organization for receiving user's input, to obtain institute State identity set.
In a kind of optional implementation, the filtering module includes:
Third determination unit, for determining global exclusion set of words, the global element excluded in set of words is network The keyword of promotion message;
Third judging unit, for judging whether the initial public feelings information includes in the global exclusion set of words Element;
4th determination unit, for not including the global element excluded in set of words in the initial public feelings information When, determine that the initial public feelings information is effective public feelings information.
In a kind of optional implementation, the filtering module includes:
5th determination unit, for determining local exclusion set of words corresponding with each default organization, the part Excluding the element in set of words is keyword relevant to default organization's business;
4th judging unit, for judging whether the initial public feelings information includes to be associated with the initial public feelings information Default organization it is corresponding part exclude set of words in element;
6th determination unit, for not including the element in the part exclusion set of words in the initial public feelings information When, determine that the initial public feelings information is effective public feelings information.
In a kind of optional implementation, the filtering module includes:
7th determination unit global excludes set of words and part corresponding with each default organization for determining Set of words is excluded, the global element excluded in set of words is the keyword of network promotion information, and the part excludes word set Element in conjunction is keyword relevant to default organization's business;
5th judging unit, for judging whether the initial public feelings information includes in the global exclusion set of words Element;
6th judging unit, for not including the global element excluded in set of words in the initial public feelings information When, judge whether the initial public feelings information includes office corresponding with the associated default organization of the initial public feelings information Portion excludes the element in set of words;
8th determination unit, when not including the element in the part exclusion set of words for the initial public feelings information, Determine that the initial public feelings information is effective public feelings information.
In a kind of optional implementation, the classifying module includes:
First document vectorization unit obtains described effective for carrying out document vectorization to effective public feelings information The corresponding term vector of public feelings information;
9th determination unit, it is corresponding to all effective public feelings informations for using the machine learning model pre-established Term vector is analyzed, and determines public sentiment classification belonging to all effective public feelings informations.
In a kind of optional implementation, the classifying module includes:
Setting unit, for public sentiment classification and the corresponding characteristic condition of each public sentiment classification to be arranged;
7th judging unit, for judging whether effective public feelings information meets the characteristic condition;
Sort out unit, for when effective public feelings information meets the characteristic condition, by effective public feelings information It is referred to the corresponding public sentiment classification of the characteristic condition,
Second document vectorization unit, for when effective public feelings information is unsatisfactory for the characteristic condition, to described Effective public feelings information carries out document vectorization, to obtain the corresponding term vector of effective public feelings information;
Tenth determination unit, for being unsatisfactory for the characteristic condition to all using the machine learning model pre-established The corresponding term vector of effective public feelings information analyzed, determine all effective public feelings informations for being unsatisfactory for the characteristic condition Affiliated public sentiment classification.
In a kind of optional implementation, described device further include:
Module 1006 is normalized, for the effective public feelings information for belonging to same event to be normalized.
In a kind of optional implementation, described device further include:
Content analysis module 1007, for carrying out content analysis to the initial public feelings information using natural language processing.
The third aspect is based on inventive concept same as network public sentiment information processing method in previous embodiment, the present invention A kind of server is also provided, as shown in figure 11.The server includes memory 1104, processor 1102 and is stored in storage On device 1104 and the computer program that can run on processor 1102, the processor 1102 are realized when executing described program The step of either network public sentiment information processing method described previously method.
Wherein, in Figure 11, bus architecture (is represented) with bus 1100, and bus 1100 may include any number of The bus and bridge of interconnection, bus 1100 will include the one or more processors represented by processor 1102 and memory 1104 The various circuits of the memory of representative link together.Bus 1100 can also will such as peripheral equipment, voltage-stablizer and power tube Various other circuits of reason circuit or the like link together, and these are all it is known in the art, therefore, no longer right herein It is described further.Bus interface 1106 provides interface between bus 1100 and receiver 1101 and transmitter 1103. Receiver 1101 and transmitter 1103 can be the same element, i.e. transceiver, provide for over a transmission medium with it is various its The unit of his device communication.Processor 1102 is responsible for management bus 1100 and common processing, and memory 1104 can be by For storage processor 1102 when executing operation used data.
Fourth aspect, based on the inventive concept with network public sentiment information processing method in previous embodiment, the present invention is also mentioned For a kind of computer readable storage medium, it is stored thereon with computer program, institute above is realized when which is executed by processor The step of stating either network public sentiment information processing method method.
This specification is referring to the method, equipment (system) and computer program product according to this specification embodiment Flowchart and/or the block diagram describes.It should be understood that can be realized by computer program instructions in flowchart and/or the block diagram The combination of process and/or box in each flow and/or block and flowchart and/or the block diagram.It can provide these meters Calculation machine program instruction is to the place of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices Device is managed to generate a machine, so that the instruction executed by computer or the processor of other programmable data processing devices It generates to specify in one or more flows of the flowchart and/or one or more blocks of the block diagram The equipment of function.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works, so that instruction stored in the computer readable memory generation includes The manufacture of commander equipment, the commander equipment are realized in one box of one or more flows of the flowchart and/or block diagram Or the function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that Series of operation steps are executed on computer or other programmable devices to generate computer implemented processing, thus calculating The instruction executed on machine or other programmable devices is provided for realizing in one or more flows of the flowchart and/or side The step of function of being specified in block diagram one box or multiple boxes.
Although the preferred embodiment of this specification has been described, once a person skilled in the art knows basic Creative concept, then additional changes and modifications may be made to these embodiments.So the following claims are intended to be interpreted as Including preferred embodiment and fall into all change and modification of this specification range.
Obviously, those skilled in the art can carry out various modification and variations without departing from this explanation to this specification The spirit and scope of book.In this way, if these modifications and variations of this specification belong to this specification claim and its are equal Within the scope of technology, then this specification is also intended to include these modifications and variations.

Claims (26)

1. a kind of network public sentiment information processing method, comprising:
Web document in acquisition internet in real time;
According to the web document acquired in preset period of time, initial public feelings information associated with default organization is determined;
The initial public feelings information is filtered, effective public feelings information is obtained;
Effective public feelings information is sorted out.
2. according to the method described in claim 1, described according to the web document acquired in preset period of time, determining and default tissue The associated initial public feelings information of mechanism includes:
Judge whether the web document includes element in identity set, and the element in the identity set is institute State the identity of default organization;
If the web document includes the element in the identity set, it is determined that the web document is the initial carriage Feelings information.
3. according to the method described in claim 1, described according to the web document acquired in preset period of time, determining and default tissue The associated initial public feelings information of mechanism includes:
The web document is filtered, effective document is obtained;
Judge whether effective document includes element in identity set, and the element in the identity set is institute State the identity of default organization;
If effective document includes the element in the identity set, it is determined that effective document is the initial carriage Feelings information.
4. according to the method in claim 2 or 3, the identity of the default organization is the default organization Full name, the alias of the default organization and one of the ProductName of the default organization or any several Combination.
5. according to the method in claim 2 or 3, before the web document in the real-time acquisition internet, further includes:
The identity for receiving the default organization of user's input, to obtain the identity set.
6. obtaining effective public feelings information according to the method described in claim 1, described be filtered the initial public feelings information Include:
Determine global exclusion set of words, the global element excluded in set of words is the keyword of network promotion information;
Judge whether the initial public feelings information includes the global element excluded in set of words;
If the initial public feelings information does not include the global element excluded in set of words, it is determined that the initial public feelings information For effective public feelings information.
7. obtaining effective public feelings information according to the method described in claim 1, described be filtered the initial public feelings information Include:
Determining part corresponding with each default organization excludes set of words, the element in the part exclusion set of words for The relevant keyword of default organization's business;
Judge whether the initial public feelings information includes office corresponding with the associated default organization of the initial public feelings information Portion excludes the element in set of words;
If the initial public feelings information does not include the element in the part exclusion set of words, it is determined that the initial public feelings information For effective public feelings information.
8. obtaining effective public feelings information according to the method described in claim 1, described be filtered the initial public feelings information Include:
Determine that the overall situation excludes set of words and local exclusion set of words corresponding with each default organization, the global exclusion Element in set of words is the keyword of network promotion information, and the part excludes the element in set of words and is and the preset group Knit the relevant keyword of institution business;
Judge whether the initial public feelings information includes the global element excluded in set of words;
If the initial public feelings information does not include the global element excluded in set of words, the initial public feelings information is judged Whether comprising part corresponding with the associated default organization of the initial public feelings information element in set of words is excluded;
If the initial public feelings information does not include the element in the part exclusion set of words, it is determined that the initial public feelings information For effective public feelings information.
9. according to the method described in claim 1, it is described to effective public feelings information carry out sort out include:
Document vectorization is carried out to effective public feelings information, obtains the corresponding term vector of effective public feelings information;
Using the machine learning model pre-established, the corresponding term vector of all effective public feelings informations is analyzed, determines institute Public sentiment classification belonging to effective public feelings information.
10. according to the method described in claim 1, it is described to effective public feelings information carry out sort out include:
Public sentiment classification and the corresponding characteristic condition of each public sentiment classification are set;
Judge whether effective public feelings information meets the characteristic condition;
If effective public feelings information meets the characteristic condition, effective public feelings information is referred to the characteristic condition Otherwise corresponding public sentiment classification carries out document vectorization to effective public feelings information, to obtain effective public feelings information Corresponding term vector;And
Using the machine learning model pre-established, to all corresponding words of effective public feelings information for being unsatisfactory for the characteristic condition Vector is analyzed, and is determined and all is unsatisfactory for public sentiment classification belonging to effective public feelings information of the characteristic condition.
11. according to the method described in claim 1, it is described effective public feelings information is sorted out after, further includes:
The effective public feelings information for belonging to same event is normalized.
12. according to the method described in claim 1, described according to the web document acquired in preset period of time, it is determining with it is default After the associated initial public feelings information of organization, further includes:
Content analysis is carried out to the initial public feelings information using natural language processing.
13. a kind of network public sentiment information processing unit, comprising:
Acquisition module, for acquiring the web document in internet in real time;
Determining module, for determining associated with default organization initial according to the web document acquired in preset period of time Public feelings information;
Filtering module obtains effective public feelings information for being filtered to the initial public feelings information;
Classifying module sorts out effective public feelings information.
14. device according to claim 13, the determining module include:
First judging unit, for judging whether the web document includes element in identity set, the identity mark Know the identity that the element in set is the default organization;
First determination unit, for determining the net when the web document includes the element in the identity set Page document is the initial public feelings information.
15. device according to claim 13, the determining module include:
Filter element obtains effective document for being filtered to the web document;
Second judgment unit, for judging whether effective document includes element in identity set, the identity mark Know the identity that the element in set is the default organization;
Second determination unit, for having described in determination when effective document includes the element in the identity set Effect document is the initial public feelings information.
16. device according to claim 14 or 15, the identity of the default organization is the default tissue One of ProductName of the full name of mechanism, the alias of the default organization and the default organization is any Several combinations.
17. device according to claim 14 or 15, further includes:
Receiving module, the identity of the default organization for receiving user's input, to obtain the identity mark Know set.
18. device according to claim 13, the filtering module include:
Third determination unit, for determining global exclusion set of words, the global element excluded in set of words is the network promotion The keyword of information;
Third judging unit, for judging whether the initial public feelings information includes the global element excluded in set of words;
4th determination unit, when for not including the element in the global exclusion set of words in the initial public feelings information, really The fixed initial public feelings information is effective public feelings information.
19. device according to claim 13, the filtering module include:
5th determination unit excludes set of words, the local exclusion for determining part corresponding with each default organization Element in set of words is keyword relevant to default organization's business;
4th judging unit, for judging whether the initial public feelings information includes associated default with the initial public feelings information The corresponding part of organization excludes the element in set of words;
6th determination unit, when for not including the element in the part exclusion set of words in the initial public feelings information, really The fixed initial public feelings information is effective public feelings information.
20. device according to claim 13, the filtering module include:
7th determination unit is used to determine global exclusion set of words and local exclusion word corresponding with each default organization Set, the global element excluded in set of words are the keyword of network promotion information, and the part excludes in set of words Element is keyword relevant to default organization's business;
5th judging unit, for judging whether the initial public feelings information includes the global element excluded in set of words;
6th judging unit, for sentencing when the initial public feelings information does not include the global element excluded in set of words Whether the initial public feelings information that breaks includes that part corresponding with the associated default organization of the initial public feelings information excludes Element in set of words;
8th determination unit determines when not including the element in the part exclusion set of words for the initial public feelings information The initial public feelings information is effective public feelings information.
21. device according to claim 13, the classifying module include:
First document vectorization unit obtains effective public sentiment for carrying out document vectorization to effective public feelings information The corresponding term vector of information;
9th determination unit, for using the machine learning model that pre-establishes, to the corresponding word of all effective public feelings informations to Amount is analyzed, and determines public sentiment classification belonging to all effective public feelings informations.
22. device according to claim 13, the classifying module include:
Setting unit, for public sentiment classification and the corresponding characteristic condition of each public sentiment classification to be arranged;
7th judging unit, for judging whether effective public feelings information meets the characteristic condition;
Sort out unit, for when effective public feelings information meets the characteristic condition, effective public feelings information to be sorted out To the corresponding public sentiment classification of the characteristic condition,
Second document vectorization unit, for when effective public feelings information is unsatisfactory for the characteristic condition, to it is described effectively Public feelings information carries out document vectorization, to obtain the corresponding term vector of effective public feelings information;
Tenth determination unit is unsatisfactory for the characteristic condition and has for using the machine learning model that pre-establishes to all The corresponding term vector of effect public feelings information is analyzed, and determines all be unsatisfactory for belonging to effective public feelings information of the characteristic condition Public sentiment classification.
23. device according to claim 13, further includes:
Module is normalized, for the effective public feelings information for belonging to same event to be normalized.
24. device according to claim 13, further includes:
Content analysis module, for carrying out content analysis to the initial public feelings information using natural language processing.
25. a kind of server including memory, processor and stores the computer that can be run on a memory and on a processor The step of program, the processor realizes any one of claim 1-12 the method when executing described program.
26. a kind of computer readable storage medium, is stored thereon with computer program, power is realized when which is executed by processor Benefit requires the step of any one of 1-12 the method.
CN201810832965.5A 2018-07-26 2018-07-26 Network public sentiment information processing method, device and server Pending CN109213929A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810832965.5A CN109213929A (en) 2018-07-26 2018-07-26 Network public sentiment information processing method, device and server

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810832965.5A CN109213929A (en) 2018-07-26 2018-07-26 Network public sentiment information processing method, device and server

Publications (1)

Publication Number Publication Date
CN109213929A true CN109213929A (en) 2019-01-15

Family

ID=64990702

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810832965.5A Pending CN109213929A (en) 2018-07-26 2018-07-26 Network public sentiment information processing method, device and server

Country Status (1)

Country Link
CN (1) CN109213929A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110175733A (en) * 2019-04-01 2019-08-27 阿里巴巴集团控股有限公司 A kind of public opinion information processing method and server

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101414300A (en) * 2008-11-28 2009-04-22 电子科技大学 Method for sorting and processing internet public feelings information
US20100312769A1 (en) * 2009-06-09 2010-12-09 Bailey Edward J Methods, apparatus and software for analyzing the content of micro-blog messages
CN104462096A (en) * 2013-09-13 2015-03-25 北大方正集团有限公司 Public opinion monitoring and analysis method and device
CN104902292A (en) * 2015-05-20 2015-09-09 无锡天脉聚源传媒科技有限公司 Television report-based public opinion analysis method and system
CN105117484A (en) * 2015-09-17 2015-12-02 广州银讯信息科技有限公司 Internet public opinion monitoring method and system
CN106920147A (en) * 2017-02-28 2017-07-04 华中科技大学 A kind of commodity intelligent recommendation method that word-based vector data drives
CN108009219A (en) * 2017-11-21 2018-05-08 国家计算机网络与信息安全管理中心 A kind of method for finding internet finance public sentiment regulatory target

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101414300A (en) * 2008-11-28 2009-04-22 电子科技大学 Method for sorting and processing internet public feelings information
US20100312769A1 (en) * 2009-06-09 2010-12-09 Bailey Edward J Methods, apparatus and software for analyzing the content of micro-blog messages
CN104462096A (en) * 2013-09-13 2015-03-25 北大方正集团有限公司 Public opinion monitoring and analysis method and device
CN104902292A (en) * 2015-05-20 2015-09-09 无锡天脉聚源传媒科技有限公司 Television report-based public opinion analysis method and system
CN105117484A (en) * 2015-09-17 2015-12-02 广州银讯信息科技有限公司 Internet public opinion monitoring method and system
CN106920147A (en) * 2017-02-28 2017-07-04 华中科技大学 A kind of commodity intelligent recommendation method that word-based vector data drives
CN108009219A (en) * 2017-11-21 2018-05-08 国家计算机网络与信息安全管理中心 A kind of method for finding internet finance public sentiment regulatory target

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110175733A (en) * 2019-04-01 2019-08-27 阿里巴巴集团控股有限公司 A kind of public opinion information processing method and server

Similar Documents

Publication Publication Date Title
US9449271B2 (en) Classifying resources using a deep network
Shi et al. Sentiment analysis of Chinese microblogging based on sentiment ontology: a case study of ‘7.23 Wenzhou Train Collision’
CN103970864B (en) Mood classification and mood component analyzing method and system based on microblogging text
CN103324666A (en) Topic tracing method and device based on micro-blog data
CN103324665A (en) Hot spot information extraction method and device based on micro-blog
Kumar et al. Analysis of various machine learning algorithms for enhanced opinion mining using twitter data streams
CN105869058B (en) A kind of method that multilayer latent variable model user portrait extracts
Alexandridis et al. A knowledge-based deep learning architecture for aspect-based sentiment analysis
CN115392237B (en) Emotion analysis model training method, device, equipment and storage medium
CN111259220A (en) Data acquisition method and system based on big data
Cheng et al. Mining research trends with anomaly detection models: the case of social computing research
CN109472022A (en) New word identification method and terminal device based on machine learning
US11321531B2 (en) Systems and methods of updating computer modeled processes based on real time external data
Smailović Sentiment analysis in streams of microblogging posts
CN110472115A (en) A kind of social networks text emotion fine grit classification method based on deep learning
Thandaga Jwalanaiah et al. Effective deep learning based multimodal sentiment analysis from unstructured big data
Zhu et al. MMLUP: Multi-Source & Multi-Task Learning for User Profiles in Social Network.
CN109213929A (en) Network public sentiment information processing method, device and server
Ohbe et al. A sentiment polarity classifier for regional event reputation analysis
CN115129885A (en) Entity chain pointing method, device, equipment and storage medium
CN112434126B (en) Information processing method, device, equipment and storage medium
Zhong et al. A graph-based approach to explore relationship between hashtags and images
Deng et al. A multimodel fusion engine for filtering webpages
Bhatia et al. Opinion score mining system
Nguyen et al. Fake news detection using knowledge graph and graph convolutional network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20200925

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant after: Innovative advanced technology Co.,Ltd.

Address before: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant before: Advanced innovation technology Co.,Ltd.

Effective date of registration: 20200925

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Applicant after: Advanced innovation technology Co.,Ltd.

Address before: A four-storey 847 mailbox in Grand Cayman Capital Building, British Cayman Islands

Applicant before: Alibaba Group Holding Ltd.

TA01 Transfer of patent application right
RJ01 Rejection of invention patent application after publication

Application publication date: 20190115

RJ01 Rejection of invention patent application after publication