Summary of the invention
For the problem of above-mentioned existence, the present invention provides a kind of public sentiment data storage method and server,
To realize the reliability storage of public sentiment data.
The invention provides a kind of public sentiment data storage method, including:
Obtain public sentiment data to be stored, distribute Data Identification for described public sentiment data to be stored, and according in advance
If each topic expression formula determine described public sentiment data to be stored corresponding topic mark;
Resolve and obtain displaying field corresponding to described public sentiment data to be stored and sort field, by described data
Mark, described topic mark, described displaying field associate with described sort field and are stored in the slow of server
In depositing, obtain pending data cached;Wherein, described displaying field includes described public sentiment data to be stored
Establishment time, founder and data content, described sort field includes described public sentiment data to be stored
Hop count and/or comment number of times;
Obtain described pending data cached from described caching, according to default thematic topic corresponding relation,
Determine whether there is the special topic mark corresponding with described pending data cached topic mark;
If there is not described special topic mark, then by described pending data cached described topic mark, institute
State Data Identification to associate with described sort field in the Recent data storehouse being stored in described server, described closely
Phase data base is used for storing described pending data cached to Dai-ichi Mutual Life Insurance duration;
In the pending data cached historical data base being stored in described server after extending, described history
Data base is pending data cached to the second life duration for store after described extension, and described second is raw
The length of life duration more than the length of described Dai-ichi Mutual Life Insurance duration, pending data cached after described extension
Include in described pending data cached and described public sentiment data to be stored except described displaying field and described
Other fields outside sort field;
To preset the first described pending data cached described topic mark of topic storage format storage, institute
State Data Identification and described establishment time in the real-time data base of described server;To preset the second topic
The described pending data cached described topic mark of storage format storage and described displaying field are to described reality
Time data base in, described real-time data base is used for storing described pending data cached to third life duration,
The length of described third life duration is less than the length of described Dai-ichi Mutual Life Insurance duration.
The invention provides a kind of server, including:
Acquisition module, is used for obtaining public sentiment data to be stored, distributes data for described public sentiment data to be stored
Mark, and determine, according to default each topic expression formula, the topic mark that described public sentiment data to be stored is corresponding;
Buffer process module, obtains displaying field corresponding to described public sentiment data to be stored and row for resolving
Sequence field, closes described Data Identification, described topic mark, described displaying field and described sort field
Connection is stored in the caching of server, obtains pending data cached;Wherein, described displaying field includes
Establishment time, founder and the data content of described public sentiment data to be stored, described sort field includes institute
State hop count and/or the comment number of times of public sentiment data to be stored;
Determine module, described pending data cached for obtaining from described caching, according to default special
Topic topic corresponding relation, it is determined whether exist corresponding with described pending data cached topic mark special
Topic mark;
Store processing module in the recent period, if determining module to determine not to exist described special topic mark for described, then
Described pending data cached described topic mark, described Data Identification are associated with described sort field
Being stored in the Recent data storehouse of described server, described Recent data storehouse is used for storing described pending slow
Deposit data is to Dai-ichi Mutual Life Insurance duration;
Historical storage processing module, for being stored in described server by pending data cached after extension
In historical data base, described historical data base is pending data cached to for store after described extension
Two life durations, the length of described second life duration is more than the length of described Dai-ichi Mutual Life Insurance duration, described
Pending data cached after extension includes described pending data cached and described public sentiment data to be stored
In other fields in addition to described displaying field and described sort field;
Real time storage and processing module, for preset the first topic storage format described pending caching of storage
The described topic mark of data, described Data Identification and described establishment time are to the real-time number of described server
According in storehouse;To preset the second described pending data cached described topic mark of topic storage format storage
With in described displaying field to described real-time data base, described real-time data base is used for storing described pending
Data cached to third life duration, the length of described third life duration is less than described Dai-ichi Mutual Life Insurance duration
Length.
The public sentiment data that the present invention provides stores method and server, resolves public sentiment data, it is thus achieved that
For carrying out the displaying field of needs when user shows and required for public sentiment data is analyzed
Sort field, thus deposit after public sentiment data carries out topic detection treating, the most only by this public sentiment data
Topic mark, Data Identification, displaying field and these information of sort field be stored in the caching of server
In, and then again topic mark, Data Identification and the sort field of the public sentiment data stored in caching are stored in
In Recent data storehouse, afterwards all information of this public sentiment data are stored in historical data base, then by this carriage
The displaying field of feelings data and sort field are the most all stored in real-time data base, complete public sentiment data different
Information is in the storage successively of Recent data storehouse, historical data base and real-time data base.Due to each data base
There is different storage durations limit, it is achieved the classification to public sentiment data does not store, and, first will obtain
Magnanimity public sentiment data carry out buffer memory, and then proceed to Recent data storehouse, historical data base and in real time
The storage of data base, it is ensured that while data storing reliability, enters public sentiment data according to different demands
Row is in real time, in the recent period and the storage respectively of history, it is simple to quickly accesses acquisition according to different demands and is stored in not
With the public sentiment data in data base to be analyzed and to apply.
Detailed description of the invention
Fig. 1 is the flow chart of public sentiment data of the present invention storage embodiment of the method, and the method can be by a use
Perform in the server carrying out public sentiment data storage and analysis management, as it is shown in figure 1, the method tool
Body includes:
Step 101, obtain public sentiment data to be stored, distribute Data Identification for described public sentiment data to be stored,
And determine, according to default each topic expression formula, the topic mark that described public sentiment data to be stored is corresponding.
In the present embodiment, public sentiment data to be stored is that the public passes through the subscriber terminal equipment of oneself in the Internet
The data carrying out the operations such as various public sentiment comment, forwarding on network and produce, server can be by existing
The modes such as gripping tool obtain public sentiment data.The storage of public sentiment data processes for convenience, and server is
Every public sentiment data one unique Data Identification of distribution, this Data Identification can be such as by public sentiment
After data carry out word segmentation processing, each participle obtained is carried out what the Hash operation of preset algorithm obtained,
It is not limited.
In the present embodiment, prestore experience or multiple topic expression formulas of statistics acquisition in the server,
And each topic expression formula uniquely corresponding topic mark.Thus, server can be deposited by treating
Storage public sentiment data carries out word segmentation processing, obtains each participle, by each topic expression formula to storage
The word comprised mates, it is possible to obtains the topic expression formula that this public sentiment data to be stored is corresponding, i.e. obtains
Topic that must be corresponding with this public sentiment data to be stored identifies.Wherein, described matching ratio mates i.e. the most completely
Comprise words whole in certain topic expression formula, it is also possible to be a certain degree of coupling, the most such as overlap
Word occupies the ratio of words whole in certain topic expression formula.
Step 102, parsing obtain displaying field corresponding to described public sentiment data to be stored and sort field,
Described Data Identification, described topic mark, described displaying field are associated with described sort field and is stored in
In the caching of server, obtain pending data cached;Wherein, described displaying field include described in wait to deposit
Establishment time, founder and the data content of storage public sentiment data, described sort field includes described to be stored
The hop count of public sentiment data and/or comment number of times.
Step 103, obtain from described caching described pending data cached, according to default special topic words
Topic corresponding relation, it is determined whether there is the special topic mark corresponding with described pending data cached topic mark
Knowing, if there is described special topic mark, then performing step 104-107;If there is not described special topic mark, then
Perform step 105-107.
In the present embodiment, a public sentiment data may include a lot of information, such as except data content
Outside, also include the founder of this public sentiment data, the establishment time, comment number of times, hop count,
Published method etc. much information.And the storage meaning of these public sentiment data is usually, by real-time
Or the statistics of the public sentiment data of a period of time, analysis, with obtain focus incident that the current public paid close attention to or
Viewpoint trend, in order to the mechanisms such as government reasonably guide, it is to avoid cause serious social influence, or
Scan for engine for users such as ICPs or message recommends to be used.Therefore, for above-mentioned
The different application occasion of citing, in the present embodiment, in order to complete magnanimity public sentiment data in time, efficiently,
While reliable memory, the public sentiment data also for storage can facilitate follow-up different analysis demand, clothes
Business device is obtaining after public sentiment data, by public sentiment data is resolved, therefrom obtain show field and
Sort field.Wherein, show that field mainly includes the establishment time of public sentiment data the most to be stored, establishment
Person and data content, sort field includes hop count and/or the comment number of times of public sentiment data to be stored.Exhibition
Show field to be mainly used in user and show a certain topic in real time or in a period of time or each public affairs of a certain special topic
Many viewpoint i.e. public sentiment data contents, sort field is mainly used in analysis of central issue.
After resolving the displaying field and sort field obtaining public sentiment data to be stored, by this public sentiment to be stored
The Data Identification of data, topic mark, displaying field associate the caching being stored in server with sort field
In, obtain pending data cached.Owing to may need to analyze storage significant amount within the same time period
Public sentiment data, and get public sentiment data to be stored to by this public sentiment data to be stored storage complete,
Processing procedure is longer, in order to alleviate the processing pressure of subsequent storage reason process, by public sentiment data follow-up
Storage is first stored in the caching of server before processing.So it is also an advantage that be exactly when after public sentiment data
Public sentiment data is just deleted after processing successfully from server buffer by phase storage, processes when the public sentiment data later stage
After failure, it is not necessary to do any operation, only need to read public sentiment data existing in caching and carry out processing,
The most both can be greatly simplified handling process, in turn ensure that the integrity of data.
Afterwards, to pending data cached, during i.e. association is stored in the caching of server present in caching
The Data Identification of public sentiment data to be stored, topic mark, show field and sort field, carry out follow-up
Storage processes.
In described subsequent storage reason, it is necessary first to carry out pending data cached thematic warehouse-in and process.
Specifically, according in server storage preset thematic topic corresponding relation, it is determined whether exist with
The special topic mark that described pending data cached topic mark is corresponding.It practice, special topic, topic and carriage
There is uncertain relation between feelings data, i.e. one topic may include multiple public sentiment data, and one
Individual special topic may correspond to multiple different topic, in the present embodiment, and can be according to adding up the special of acquisition in advance
Topic determines the special topic mark that currently pending data cached topic mark is corresponding with the corresponding relation of topic
Whether exist.
Step 104, with preset the 3rd special topic storage format storage described pending data cached described specially
Topic mark, in described Data Identification and described establishment time to described real-time data base;Special to preset the 4th
The described pending data cached described special topic mark of topic storage format storage and described displaying field are to described
In real-time data base.
If there is the special topic mark corresponding with pending data cached topic mark, then carry out pending slow
The special topic of deposit data enters the process of real-time data base.What deserves to be explained is, in the present embodiment, at server
In be provided with three kinds of data bases: real-time data base, Recent data storehouse and historical data base, wherein, described
Real-time data base is positioned in the internal memory of described server;Described Recent data storehouse is relevant database;Institute
Stating historical data base is non-relational NoSQL data base.Wherein, real-time data base is used for storing pending
Data cached certain life duration, such as from certain pending data cached be stored in real-time data base time
Between start at, store the time of one week, week age is automatically deleted this data when arriving.
Specifically, by pending data cached carry out special topic enter real-time data base storage during, this
Embodiment provides two kinds of storage formats store respectively, the respectively the 3rd special topic storage format and the
Four special topic storage formats.Wherein, this pending data cached special topic is stored with the 3rd special topic storage format
Mark, in Data Identification and establishment time to real-time data base, concrete form visual representation is: (special topic mark
Knowledge-Data Identification, creates the time);This is stored pending data cached special with the 4th special topic storage format
Topic mark and show field in real-time data base, concrete form visual representation is: (special topic mark, list
(displaying field)).Wherein, list is the implication of list, and its implication refers to belong to a special topic mark
Some pending data cached displaying field be respectively written into successively in this list list.Wherein, this two
Planting storage format and be respectively used to different purposes, the third special topic storage format is used for sentencing weight and eliminating, i.e.
In order to avoid same pending data cached repeat process, by repeat process pending data cached from
Caching is deleted;4th kind of thematic storage format is used for showing special topic situation in real time, referring in real time now
There is the real-time of certain period of time implication.It addition, data thematic information is only stored in real-time data base,
This partial information is to show user in order to quick-searching goes out certain special topic related data.
After pending data cached special topic enters real-time data base success, or talk about with it determining not exist
After the special topic mark that topic mark is corresponding, perform following subsequent step, i.e. carry out pending data cached
Topic warehouse-in processes.
Step 105, by described pending data cached described topic mark, described Data Identification and institute
Stating sort field association to be stored in the Recent data storehouse of described server, described Recent data storehouse is used for depositing
Store up described pending data cached to Dai-ichi Mutual Life Insurance duration.
In the present embodiment, use first Recent data storehouse, then historical data base, then the storage of real-time data base
Order stores pending data cached topic information successively.
First, identified by currently pending data cached topic, Data Identification associates with sort field and deposits
Storage is in the Recent data storehouse of server, and wherein, concrete storage format can visual representation be: (topic mark
Knowledge-Data Identification, sort field).This Recent data storehouse is used for storing pending data cached raw to first
Life duration, such as 1 month.Wherein, in this Recent data storehouse, the topic information spinner of storage to be used for analyzing
It is used.Recent data storehouse only stores a pending data cached part and analyzes field i.e. sort field,
Do not store the details of data.
Step 106, will extension after the pending data cached historical data base being stored in described server in,
Described historical data base is pending data cached to the second life duration, institute for store after described extension
State the length of the second life duration length more than described Dai-ichi Mutual Life Insurance duration, pending after described extension
Data cached include in described pending data cached and described public sentiment data to be stored except described displaying word
Other fields outside section and described sort field.
Secondly, by the above-mentioned sort field of public sentiment data to be stored, show field, and except showing field
Other all or part of fields with outside sort field, are stored in the historical data base of server.Wherein,
Historical data base is used for storing above-mentioned public sentiment data to the second life duration, the length of described second life duration
Degree, more than the length of described Dai-ichi Mutual Life Insurance duration, is such as whole life cycle.
Data in above-mentioned Recent data storehouse and historical data base are served only for analyzing, and the angle of problem analysis is
Centered by topic, belonging to which special topic for certain topic is useless in analysis, when to
When user shows analysis results, the affiliated topic for special topic can directly obtain from special topic topic corresponding relation
?.
Step 107, with preset first topic storage format store described pending data cached described words
Topic mark, in described Data Identification and described establishment time to the real-time data base of described server;With in advance
If the second described pending data cached described topic mark of topic storage format storage and described displaying word
Section is in described real-time data base, and described real-time data base is used for storing described pending data cached to the
Three life durations, the length of described third life duration is less than the length of described Dai-ichi Mutual Life Insurance duration.
Finally, carry out topic data information and enter the process of real-time data base.Specifically, it is provided that two kinds
Storage format carries out topic process: the first topic storage format and the second topic storage format.Wherein, with
First topic storage format storage pending data cached topic mark, Data Identification and establishment time arrive
In the real-time data base of server, concrete form visual representation is: (topic mark-Data Identification creates
Time);Identify with the second pending data cached topic of topic storage format storage and show that field is to real
Time data base in, concrete form visual representation is: (topic identify, list (displaying field)).Wherein, list
Being the implication of list, its implication refers to belong to the some pending data cached of a topic mark
Show that field is respectively written in this list list successively.Described real-time data base is used for storing described pending
Data cached to third life duration, the length of described third life duration is less than above-mentioned Dai-ichi Mutual Life Insurance duration
Length, be such as one week.
Wherein, both topic storage formats are respectively used to different purposes, and with kind of a thematic storage format
For sentencing weight and eliminating, i.e. repeat to process in order to avoid same pending data cached topic information,
It is removed from the cache repeating the pending data cached of process;The second special topic storage format is used for showing
In real time topic situation, now refer to that there is the real-time of certain period of time implication in real time.
In the present embodiment, public sentiment data is resolved, it is thus achieved that for carrying out needs when user shows
Show field, and for public sentiment data being analyzed required sort field, thus deposit carriage treating
After feelings data carry out topic detection, the most only by the topic mark of this public sentiment data, Data Identification, displaying
Field and these information of sort field are stored in the caching of server, and then the carriage that will store in caching again
Topic mark, Data Identification and the sort field of feelings data are stored in Recent data storehouse, afterwards by this public sentiment
All information of data are stored in historical data base, then by the displaying field of this public sentiment data and sort field
The most all it is stored in real-time data base, completes public sentiment data difference information in Recent data storehouse, historical data
Storehouse and the storage successively of real-time data base.Limit owing to each data base has different storage durations, real
The now classification to public sentiment data does not store, and, first the magnanimity public sentiment data of acquisition is carried out buffer memory,
And then proceeding to the storage of Recent data storehouse, historical data base and real-time data base, it is ensured that data store
While reliability, according to different demands public sentiment data carried out real-time, in the recent period and the storage respectively of history,
It is easy to quickly access according to different demands obtain the public sentiment data being stored in disparate databases to be analyzed
And application.
Optionally, obtain from described caching in above-mentioned steps 103 described pending data cached after,
Also comprise the processing steps of:
Determine and whether described real-time data base exists and described pending data cached described Data Identification
The list item corresponding with described topic mark;If existing, then delete described pending data cached.The most above-mentioned
The purposes of the topic information of the first topic storage format storage embodies.If real-time data base has existed
Certain topic mark and certain Data Identification, illustrate that this data is the most processed, it is not necessary to repeats to process.
It addition, topic entered after real-time data base is disposed in step 107, will corresponding in caching
Pending data cached deletion, and carry out next data cached processing procedure.
Fig. 2 is the structural representation of server example of the present invention, as in figure 2 it is shown, this server includes:
Acquisition module 11, is used for obtaining public sentiment data to be stored, distributes number for described public sentiment data to be stored
According to mark, and determine, according to default each topic expression formula, the topic mark that described public sentiment data to be stored is corresponding
Know;
Buffer process module 12, for resolve obtain displaying field corresponding to described public sentiment data to be stored and
Sort field, by described Data Identification, described topic mark, described displaying field and described sort field
Association is stored in the caching of server, obtains pending data cached;Wherein, described displaying field bag
Including establishment time, founder and the data content of described public sentiment data to be stored, described sort field includes
The hop count of described public sentiment data to be stored and/or comment number of times;
Determine module 13, described pending data cached, according to default for obtaining from described caching
Special topic topic corresponding relation, it is determined whether exist corresponding with described pending data cached topic mark
Special topic mark;
In the recent period storage processing module 14, if determining module 13 to determine not to there is described special topic mark for described
Know, then by described pending data cached described topic mark, described Data Identification and described sequence word
Duan Guanlian is stored in the Recent data storehouse of described server, described Recent data storehouse be used for storing described in treat
Process data cached to Dai-ichi Mutual Life Insurance duration;
Historical storage processing module 15, for pending data cached being stored in described server after extension
Historical data base in, pending data cached after store described extension arrives described historical data base
Second life duration, the length of described second life duration is more than the length of described Dai-ichi Mutual Life Insurance duration, institute
That states after extension pending data cached includes described pending data cached and described public sentiment number to be stored
Other fields in addition to described displaying field and described sort field according to;
Real time storage and processing module 16, for described pending slow to preset the first topic storage format storage
Real-time to described server of the described topic mark of deposit data, described Data Identification and described establishment time
In data base;To preset the second topic storage format described pending data cached described topic mark of storage
Know and described displaying field be in described real-time data base, described real-time data base be used for storing described in wait to locate
Manage data cached to third life duration, when the length of described third life duration is less than described Dai-ichi Mutual Life Insurance
Long length.
Optionally, described determine that module 13 is additionally operable to:
Determine and whether described real-time data base exists and described pending data cached described Data Identification
The list item corresponding with described topic mark;
Described server also includes:
Removing module 17, if determine that module 13 determines there is described list item for described, then deletes described
Pending data cached.
Further, described real time storage and processing module 16 is additionally operable to:
Determine that module determines that there is described special topic identifies if described, then deposit with default 3rd special topic storage format
Store up described pending data cached described special topic mark, described Data Identification and described establishment time to institute
State in real-time data base;With preset the 4th special topic storage format storage described pending data cached described in
Special topic mark and described displaying field are in described real-time data base.
Further, described removing module 17 is additionally operable to:
Delete described pending data cached from described caching.
Wherein, during described real-time data base is positioned at the internal memory of described server;Described Recent data storehouse is for closing
It is type data base;Described historical data base is non-relational NoSQL data base.
The device of the present embodiment may be used for performing the technical scheme of embodiment of the method shown in Fig. 1, and it realizes
Principle is similar with technique effect, and here is omitted.
One of ordinary skill in the art will appreciate that: realize all or part of step of said method embodiment
Can be completed by the hardware that programmed instruction is relevant, aforesaid program can be stored in a computer-readable
Taking in storage medium, this program upon execution, performs to include the step of said method embodiment;And it is aforementioned
Storage medium include: various Jie that can store program code such as ROM, RAM, magnetic disc or CD
Matter.
Last it is noted that various embodiments above is only in order to illustrate technical scheme, rather than right
It limits;Although the present invention being described in detail with reference to foregoing embodiments, this area common
Skilled artisans appreciate that the technical scheme described in foregoing embodiments still can be modified by it,
Or the most some or all of technical characteristic is carried out equivalent;And these amendments or replacement, and
The essence not making appropriate technical solution departs from the scope of various embodiments of the present invention technical scheme.