CN109947937A - A kind of fuel price information comparison system and method based on big data - Google Patents
A kind of fuel price information comparison system and method based on big data Download PDFInfo
- Publication number
- CN109947937A CN109947937A CN201811602285.0A CN201811602285A CN109947937A CN 109947937 A CN109947937 A CN 109947937A CN 201811602285 A CN201811602285 A CN 201811602285A CN 109947937 A CN109947937 A CN 109947937A
- Authority
- CN
- China
- Prior art keywords
- data
- oil price
- module
- buffer memory
- page
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention belongs to big data fields, a kind of fuel price information comparison system and method based on big data is disclosed, network data catching pattern, log collection module, subscriber network behavioral data collection module, data analysis module, data storage module, data judgment module, big data processing module, data judgment module are provided with;Big data processing module connect with modules and coordinates the work operation of modules.The present invention utilizes information terminal module, completes the inquiry comparison of oil price;Message processing module can carry out comprehensive analysis to gas stations all in a certain range in conjunction with oil price, road conditions, distance and fuel consumption information;User can use information terminal module and send inquiry instruction and receive fuel price information;Oil price comparison can be efficiently accomplished, the various aspects of gas station are accurately analyzed, user is helped to make optimal selection.
Description
Technical field
The invention belongs to big data field, in particular to a kind of fuel price information comparison system and method based on big data.
Background technique
Currently, the prior art commonly used in the trade is such that
Automobile has become the main walking-replacing tool of people, and main fuel of the gasoline as automobile, price are also to have
One of most concerned problem of vehicle family, or even using fuel consumption as influence factor when buying car.And petroleum can not be again as one kind
Production-goods source determines the high of its price, the topic that oil price is also continued saying it with interest at people.Oil price rises steadily, and
And people are continuously increased the demand of gasoline also with increasing for automobile.So one quality-high and inexpensive oiling of selection
Station is the problem that current car owner is concerned about very much.
Existing fuel price information comparison system can not be in conjunction with oil price, road conditions, distance and fuel consumption information, to certain model
It encloses interior all gas stations and carries out comprehensive analysis, cause user that cannot send inquiry instruction using information terminal module and receive oil price
Information carries out oil price comparison, carries out comprehensive accurate analysis to gas station, prevents user from the selection made.In the prior art
Data are judged, using current algorithm, identification cannot be rapidly performed by abnormal data, increase judgement when
Between.Reservoir carries out processing storage to unbalanced dataset in the prior art, using traditional algorithm, not can be reduced for training
Data and reduce data set scale, extend the model training time, reduce the classification effectiveness of algorithm.It is right in the prior art
The oil price data of collection are compared, and during the distribution situation for obtaining oil price, using traditional algorithm, cannot effectively mention
Height improves the stability of clustering precision and cluster, reduces the distributed mass of oil price data.
In conclusion problem of the existing technology is:
(1) existing fuel price information comparison system can not be in conjunction with oil price, road conditions, distance and fuel consumption information, to one
Determine all gas stations in range and carry out comprehensive analysis, cause user that cannot send inquiry instruction using information terminal module and receives
Fuel price information carries out oil price comparison, carries out comprehensive accurate analysis to gas station, prevents user from the selection made.
(2) data are judged in the prior art, using current algorithm, abnormal data cannot be rapidly performed by
Identification, increases the time of judgement.
(3) reservoir carries out processing storage to unbalanced dataset in the prior art, using traditional algorithm, not can be reduced
For the scale of trained data and reduction data set, the model training time is extended, the classification effectiveness of algorithm is reduced.
(4) the oil price data of collection are compared in the prior art, during the distribution situation for obtaining oil price, are used
Traditional algorithm cannot effectively improve the stability for improving clustering precision and cluster, reduce the distributed mass of oil price data.
Summary of the invention
In view of the problems of the existing technology, the fuel price information comparison system that the present invention provides a kind of based on big data and
Method.
The invention is realized in this way a kind of fuel price information control methods based on big data includes:
Step 1, firstly, obtaining the data information of oil price using network, log, user network behavior;
Step 2 is analyzed according to the oil price data information of acquisition, to the oil price data of acquisition, is judged whether
It is true oil price, judges not to be true oil price, deviate normal oil price range, oil price is invalid, without handling and storing up
Deposit output;
Step 3 is compared the oil price data of collection, obtains the distribution situation of oil price;
Step 4, fuel price information and oil price distributed data to acquisition store, and show oil price by display screen
Distribution results.
Further, the various oil price data collected using reservoir storage, carry out data classification storage, to entire data set
The data that lack sampling processing reduces training are carried out in the case where keeping distribution, are reduced the scale of data set, are carried out data classification
It is middle that unbalanced data classification is carried out using RSBoost algorithm, it specifically includes:
Given training set S={ (x1, y1), (x2, y2) ..., (xm, ym), sample xi∈XdIt is d dimensional feature vector, class label
yi∈ { P, N }, wherein P corresponds to minority class, the corresponding most classes of N;
Input: training set St, base categorization module WL, over-sampling rate M, undersampling rate N;
Output: disaggregated model H (x);
Step 1, initialization data concentrate the weight of sample:
D1(i)=1/m;
Step 2, after carrying out the processing of SMOTE over-sampling to minority class according to over-sampling rate M, in the feelings for keeping data distribution
Random lack sampling processing is carried out to entire data set with undersampling rate N under condition, generates training dataset St', weight distribution is
Dt′;
Step 3, for t=1to T;
(1) according to training dataset St' and its weight distribution Dt', training weak typing module WL, and calculate weak hypothesis ht: X
× Y → [0,1];
(2) h is calculatedtPseudo- loss:
(3) weight undated parameter is calculated:
ωt=(1/2) (1+ht(xi, yi)-ht(xi, y));
(4) weight distribution D is updatedt
(5) normalized:
Step 4 obtains final classification model by T weak hypothesis weight votes:
Further, oil price data are acquired, are carried out during judging whether to be true oil price, to abnormal oil price data
It is identified, is identified, specifically included using DBSCAN algorithm:
Step 1, checks the data object P not being accessed in data and this data object does not process, and checks it
The field Eps NEps(p), if its NEps(p) the data object number for including in field is greater than or equal to Minpts, then establishes new
Cluster C, and the data object for including in P and its field is incorporated in C;
Step 2 checks its field Eps N if there is not processed data object Q in CEps(p), if its N of fruitEps
(p) data object for including in field is more than or equal to Minpts, and the point for including in Q and its field is substituted into C;
Step 3 repeats step 2, until the object in C is all processed;
Step 4 repeats step 1 to step 3 until all data objects are all accessed, and all data objects
It is collectively labeled as some cluster or is considered as abnormal data.
Further, the oil price data of collection are compared, during the distribution situation for obtaining oil price, using K-
Means is compared, and specifically includes;
Step 1 is based on formula
Text similarity is calculated, matrix M is constructed;
Step 2 is based on formula
P={ a1, a2..., an};
WhereinA set P is constructed, and it is ranked up, according to ascending order
Mode;
Step 3 establishes initial center point set I and deletes collection, is disposed as empty set;
Step 4, maximum one is selected in set P corresponding to text djAs a central point, and it is added into
Initial center point is concentrated, i.e. I=I ∪ { dj};
Step 5, by matrix M with text djText similarity reach certain value (sim (di, dj) > β) and all texts
Originally it is put into Delete (Delete=De-lete ∪ { ai), and deleted from set P, i.e. P=P- { ai};
Step 6 judges whether P=φ, i < k, if condition be it is true, by the data cover in De-lete into P,
That is P=Delete;
Step 7, unless meeting termination condition, i.e. otherwise i=k repeats step 3 to step 6, finally obtains k
Initial cluster center point;
Step 8 calculates the centered text of each class cluster and the text similarity of other texts according to cosine formula, according to
Similarity size will be put into the cluster with the maximum text of class cluster center similarity;
Step 9 calculatesAgain the center of each class cluster is obtained;
Otherwise step 10 repeats step 8 to step 9 unless meeting termination condition.
It further, is in advance the data in database in storing to the fuel price information of acquisition and oil price distributed data
Table establishes record buffer memory, and the record buffer memory carries out reading and writing data as unit of data line;
When receiving the data inquiry request of client, requested data are searched in the record buffer memory;
If searching failure, requested data are searched in the caching of page of the database, the caching of page is with page
Basic unit carries out reading and writing data;
The data found in the record buffer memory or the caching of page are back to client;
Data are added into the record buffer memory, specifically, it is slow that the data found in caching of page are added to record
In depositing;
Further, the process of addition data includes: in Xiang Suoshu record buffer memory
In record buffer memory, select with data to be added there are the record data of same order to be replaced;
Further, in Xiang Suoshu record buffer memory add data process further comprise: in record buffer memory, selection with to
The record buffer memory page that data have different number grade is added, the occupied space of the caching page is recycled, utilizes recycled space
New record buffer memory page is distributed for the data to be added, and the new record buffer memory page is written into the data to be added.
Further, obtain with the data to be added have same order record data access frequency Frec, with
And there is the access frequency Fpage of the record buffer memory page of different number grade with data to be added;
Judge whether Frec > replace_page_ratio*Fpage is true, if it is, select in record buffer memory,
Select with data to be added there are the record data of same order to be replaced;Otherwise select in record buffer memory, selection with
Data to be added have the record buffer memory page of different number grade, recycle the occupied space of the caching page, utilize recycled sky
Between be that the data to be added distribute new record buffer memory page, the new record buffer memory page is written into the data to be added.
Wherein replace_page_ratio be preset replacement control parameter, replace_page_ratio ∈ (0,1];
The preparation method of the access frequency Fpage of the record buffer memory page with data to be added with different number grade
Are as follows:
Fpage=(Fmin+Fmax)/2*N;
Wherein, Fmin is the access frequency for the data that timestamp is earliest in the record buffer memory page, and Fmax is the record buffer memory
The access frequency of the data of timestamp the latest in page, N are the data record total amount of the record buffer memory page.
Another object of the present invention is to provide a kind of fuel price information comparison system based on big data, it is described based on big
The fuel price information comparison system of data is provided with
Network data catching pattern is connect with big data processing module, be directed to oil price using internet hunt
Property data grabber, and according to certain rules with screening criteria carry out data classification;
Log collection module is connect with big data processing module, utilizes the sale log text in the website in each petrol station
The collection of part progress data;
Subscriber network behavioral data collection module, connect with big data processing module, according to oil price of the user on network
The data of oil price are collected in conclusion of the business behavior;
Data analysis module is connect with big data processing module, is compared to the oil price data of collection, is obtained oil price
Distribution situation;
Data storage module is connect with big data processing module, the various oil price data collected using reservoir storage, into
Row data classified storage;
As a result output module is connect with big data processing module, utilizes the distribution situation of display screen output oil price;
Big data processing module collects mould with network data catching pattern, log collection module, subscriber network behavioral data
Block, data analysis module, data storage module, result output module, the connection of data judgment module, coordinate the work of modules
Operation;
Data judgment module is connect with big data processing module, to the oil price data of acquisition, judge whether being true
Oil price.
Another object of the present invention is to provide a kind of bases of the fuel price information comparison system described in carrying based on big data
Platform is compared in the fuel price information of big data.
The various oil price data that the data storage module is collected using reservoir storage, carry out the mistake of data classification storage
Cheng Zhong adjusts the degree of balance of unbalanced dataset to increase minority class data bulk, so that equilibrium data is distributed;To entire
Data set carries out lack sampling processing in the case where keeping distribution to reduce the data for training, reduces the scale of data set,
So as to reduce the model training time, the classification effectiveness of algorithm is improved.
In conclusion advantages of the present invention and good effect are as follows:
The present invention utilizes information terminal module, completes the inquiry comparison of oil price.Message processing module can combine oil price, road
Condition, distance and fuel consumption information carry out comprehensive analysis to gas stations all in a certain range.User can use information terminal
Module sends inquiry instruction and receives fuel price information;Oil price comparison can be efficiently accomplished, the various aspects of gas station are carried out quasi-
Really analysis helps user to make optimal selection.
The oil price data that data judgment module acquires in the present invention, carry out during judging whether to be true oil price,
In order to be rapidly performed by identification to abnormal oil price data, the time of judgement is reduced, using DBSCAN algorithm.
The various oil price data that data storage module is collected using reservoir storage in the present invention, carry out data classification storage
During, the degree of balance of unbalanced dataset is adjusted in order to increase minority class data bulk, so that equilibrium data is distributed;It is right
Entire data set carries out lack sampling processing in the case where keeping distribution to reduce the data for training, reduces the rule of data set
Mould improves the classification effectiveness of algorithm so as to reduce the model training time, using the uneven number based on RSBoost algorithm
According to classification method.
The various oil price data that data storage module is collected using reservoir storage in the present invention, carry out data classification storage
During, the degree of balance of unbalanced dataset is adjusted in order to increase minority class data bulk, so that equilibrium data is distributed;It is right
Entire data set carries out lack sampling processing in the case where keeping distribution to reduce the data for training, reduces the rule of data set
Mould improves the classification effectiveness of algorithm so as to reduce the model training time, using the uneven number based on RSBoost algorithm
According to classification method.
The oil price data of collection are compared in data analysis module in the present invention, obtain the process of the distribution situation of oil price
In, in order to improve the stability of clustering precision and cluster, the distributed mass of oil price data is improved, using K-Means.In initial
It should be noted that successively to select the maximum text of similarity according to average text similarity ordering scenario, only in heart selection
It just can guarantee that the central point selected has biggish correlation with the data in data set in this way, can more preferably represent a part of number
According to, ensure that central point distribution it is uniform.
During fuel price information and oil price distributed data to acquisition store, note is established for the tables of data in database in advance
Record caching, the record buffer memory carry out reading and writing data as unit of data line;
When receiving the data inquiry request of client, requested data are searched in the record buffer memory;
If searching failure, requested data are searched in the caching of page of the database, the caching of page is with page
Basic unit carries out reading and writing data;
The data found in the record buffer memory or the caching of page are back to client;
Data are added into the record buffer memory, specifically, it is slow that the data found in caching of page are added to record
In depositing;
The process that the present invention adds data into the record buffer memory includes:
In record buffer memory, select with data to be added there are the record data of same order to be replaced;
The process that data are added into the record buffer memory further comprises: in record buffer memory, selection and number to be added
According to the record buffer memory page with different number grade, the occupied space of the caching page is recycled, it is described for utilizing recycled space
Data to be added distribute new record buffer memory page, and the new record buffer memory page is written in the data to be added.
Obtain with the data to be added with same order record data access frequency Frec and with wait add
Addend evidence has the access frequency Fpage of the record buffer memory page of different number grade;
Judge whether Frec > replace_page_ratio*Fpage is true, if it is, select in record buffer memory,
Select with data to be added there are the record data of same order to be replaced;Otherwise select in record buffer memory, selection with
Data to be added have the record buffer memory page of different number grade, recycle the occupied space of the caching page, utilize recycled sky
Between be that the data to be added distribute new record buffer memory page, the new record buffer memory page is written into the data to be added.
Wherein replace_page_ratio be preset replacement control parameter, replace_page_ratio ∈ (0,1].
The real-time storage of data can be achieved.
Detailed description of the invention
Fig. 1 is the fuel price information comparison system structural schematic diagram provided in an embodiment of the present invention based on big data.
Fig. 2 is the fuel price information control methods flow chart provided in an embodiment of the present invention based on big data.
In figure: 1, network data catching pattern;2, log collection module;3, subscriber network behavioral data collection module;4,
Data analysis module;5, data storage module;6, data judgment module;7, big data processing module;8, data judgment module.
Specific embodiment
In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to embodiments, to the present invention
It is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, it is not used to
Limit the present invention.
Application principle of the invention is explained in detail with reference to the accompanying drawing.
As shown in Figure 1, the fuel price information comparison system provided in an embodiment of the present invention based on big data includes: network data
Collection module 1, log collection module 2, subscriber network behavioral data collection module 3, data analysis module 4, data storage module
5, data judgment module 6, big data processing module 7, data judgment module 8.
Network data catching pattern 1 is connect with big data processing module 7, has carried out needle to oil price using internet hunt
To the data grabber of property, and the classification of data is carried out with screening criteria according to certain rules;
Log collection module 2 is connect with big data processing module 7, utilizes the sale log in the website in each petrol station
The collection of file progress data;
Subscriber network behavioral data collection module 3, connect with big data processing module 7, according to oil of the user on network
Valence conclusion of the business behavior, collects the data of oil price;
Data analysis module 4 is connect with big data processing module 7, is compared to the oil price data of collection, is obtained oil price
Distribution situation;
Data storage module 5 is connect with big data processing module 7, the various oil price data collected using reservoir storage,
Carry out data classification storage;
As a result output module 6 are connect with big data processing module 7, utilize the distribution situation of display screen output oil price;
Big data processing module 7 is received with network data catching pattern 1, log collection module 2, subscriber network behavioral data
Collect module 3, data analysis module 4, data storage module 5, result output module 6, data judgment module 8 to connect, coordinate each
The work of module is run;
Data judgment module 8 is connect with big data processing module 7, to the oil price data of acquisition, judge whether being true
Real oil price.
As shown in Fig. 2, the fuel price information control methods provided in an embodiment of the present invention based on big data, specifically includes following
Step:
S101: firstly, using network, log, user network behavior, the data information of oil price is obtained;
S102: according to the oil price data information of acquisition, being analyzed, to the oil price data of acquisition, judge whether be
True oil price such as judges not being true oil price, and it is then that the oil price is invalid that it is many, which to deviate normal oil price range, not into
Row processing and storage output;
S103: the oil price data of collection are compared, obtain the distribution situation of oil price;
S104: fuel price information and oil price distributed data to acquisition store, and show oil price by display screen
Distribution results.
The various oil price data that the data storage module 5 is collected using reservoir storage, carry out data classification storage
In the process, the degree of balance of unbalanced dataset is adjusted in order to increase minority class data bulk, so that equilibrium data is distributed;To whole
A data set carries out lack sampling processing in the case where keeping distribution to reduce the data for training, reduces the rule of data set
Mould improves the classification effectiveness of algorithm so as to reduce the model training time, using the uneven number based on RSBoost algorithm
According to classification method, detailed process is as follows:
Given training set S={ (x1, y1), (x2, y2) ..., (xm, ym), sample xi∈XdIt is d dimensional feature vector, class label
yi∈ { P, N }, wherein P corresponds to minority class, the corresponding most classes of N;
Input: training set Si, base categorization module WL, over-sampling rate M, undersampling rate N;
Output: disaggregated model H (x);
Step 1, initialization data concentrate the weight of sample:
D1(i)=1/m;
Step 2, after carrying out the processing of SMOTE over-sampling to minority class according to over-sampling rate M, in the feelings for keeping data distribution
Random lack sampling processing is carried out to entire data set with undersampling rate N under condition, generates training dataset St', weight distribution is
Dt′;
Step 3, for t=1to T;
(1) according to training dataset St' and its weight distribution Dt', training weak typing module WL, and calculate weak hypothesis ht: X
× Y → [0,1];
(2) h is calculatedtPseudo- loss:
(3) weight undated parameter is calculated:
ωt=(1/2) (1+ht(xi, yi)-ht(xi, y));
(4) weight distribution D is updatedt
(5) normalized:
Step 4 obtains final classification model by T weak hypothesis weight votes:
The oil price data that the data judgment module 8 acquires, carry out during judging whether to be true oil price, in order to
Identification is rapidly performed by abnormal oil price data, the time of judgement is reduced, using DBSCAN algorithm, specifically includes following step
It is rapid:
Step 1, checks the data object P not being accessed in data and this data object does not process, and checks it
The field Eps NEps(p), if its NEps(p) the data object number for including in field is greater than or equal to Minpts, then establishes new
Cluster C, and the data object for including in P and its field is incorporated in C;
Step 2 checks its field Eps N if there is not processed data object Q in CEps(p), if its N of fruitEps
(p) data object for including in field is more than or equal to Minpts, and the point for including in Q and its field is substituted into C;
Step 3 repeats step 2, until the object in C is all processed;
Step 4 repeats step 1 to step 3 until all data objects are all accessed, and all data objects
It is collectively labeled as some cluster or is considered as abnormal data.
The oil price data of 4 pairs of data analysis module collections are compared, during the distribution situation for obtaining oil price,
In order to improve the stability of clustering precision and cluster, improve the distributed mass of oil price data, using K-Means, specifically include with
Lower step;
Step 1 is based on formula
Text similarity is calculated, matrix M is constructed;
Step 2 is based on formula
P={ a1, a2..., an};
WhereinA set P is constructed, and it is ranked up, according to ascending order
Mode;
Step 3 establishes initial center point set I and deletes collection, is disposed as empty set;
Step 4, maximum one is selected in set P corresponding to text djAs a central point, and it is added into
Initial center point is concentrated, i.e. I=I ∪ { dj};
Step 5, by matrix M with text djText similarity reach certain value (sim (di, dj) > β) and all texts
Originally it is put into Delete (Delete=De-lete ∪ { ai), and deleted from set P, i.e. P=P- { ai};
Step 6 judges whether P=φ, i < k, if condition be it is true, by the data cover in De-lete into P,
That is P=Delete;
Step 7, unless meeting termination condition, i.e. otherwise i=k repeats step 3 to step 6, finally obtains k
Initial cluster center point;
Step 8 calculates the centered text of each class cluster and the text similarity of other texts according to cosine formula, according to
Similarity size will be put into the cluster with the maximum text of class cluster center similarity;
Step 9 calculatesAgain the center of each class cluster is obtained;
Otherwise step 10 repeats step 8 to step 9 unless meeting termination condition.
It in embodiments of the present invention, is in advance data in storing to the fuel price information of acquisition and oil price distributed data
Tables of data in library establishes record buffer memory, and the record buffer memory carries out reading and writing data as unit of data line;
When receiving the data inquiry request of client, requested data are searched in the record buffer memory;
If searching failure, requested data are searched in the caching of page of the database, the caching of page is with page
Basic unit carries out reading and writing data;
The data found in the record buffer memory or the caching of page are back to client;
Data are added into the record buffer memory, specifically, it is slow that the data found in caching of page are added to record
In depositing;
In embodiments of the present invention, the process of addition data includes: in Xiang Suoshu record buffer memory
In record buffer memory, select with data to be added there are the record data of same order to be replaced;
The process that data are added into the record buffer memory further comprises: in record buffer memory, selection and number to be added
According to the record buffer memory page with different number grade, the occupied space of the caching page is recycled, it is described for utilizing recycled space
Data to be added distribute new record buffer memory page, and the new record buffer memory page is written in the data to be added.
Obtain with the data to be added with same order record data access frequency Frec and with wait add
Addend evidence has the access frequency Fpage of the record buffer memory page of different number grade;
Judge whether Frec > replace_page_ratio*Fpage is true, if it is, select in record buffer memory,
Select with data to be added there are the record data of same order to be replaced;Otherwise select in record buffer memory, selection with
Data to be added have the record buffer memory page of different number grade, recycle the occupied space of the caching page, utilize recycled sky
Between be that the data to be added distribute new record buffer memory page, the new record buffer memory page is written into the data to be added.
Wherein replace_page_ratio be preset replacement control parameter, replace_page_ratio ∈ (0,1];
The preparation method of the access frequency Fpage of the record buffer memory page with data to be added with different number grade
Are as follows:
Fpage=(Fmin+Fmax)/2*N;
Wherein, Fmin is the access frequency for the data that timestamp is earliest in the record buffer memory page, and Fmax is the record buffer memory
The access frequency of the data of timestamp the latest in page, N are the data record total amount of the record buffer memory page.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all in essence of the invention
Made any modifications, equivalent replacements, and improvements etc., should all be included in the protection scope of the present invention within mind and principle.
Claims (10)
1. a kind of fuel price information control methods based on big data, which is characterized in that the fuel price information pair based on big data
Ratio method includes:
Step 1, firstly, obtaining the data information of oil price using network, log, user network behavior;
Step 2 is analyzed according to the oil price data information of acquisition, to the oil price data of acquisition, judge whether being true
Real oil price, judges not to be true oil price, deviate normal oil price range, oil price is invalid, defeated without handling and storing
Out;
Step 3 is compared the oil price data of collection, obtains the distribution situation of oil price;
Step 4, fuel price information and oil price distributed data to acquisition store, and point of oil price is shown by display screen
Cloth result.
2. the fuel price information control methods based on big data as described in claim 1, which is characterized in that stored using reservoir
The various oil price data collected carry out data classification storage, carry out lack sampling in the case where keeping and being distributed to entire data set
Processing reduces the data of training, reduces the scale of data set, carries out carrying out uneven number using RSBoost algorithm in data classification
According to classification, specifically include:
Given training set S={ (x1, y1), (x2, y2) ..., (xm, ym), sample xi∈XdIt is d dimensional feature vector, class marks yi∈
{ P, N }, wherein P corresponds to minority class, the corresponding most classes of N;
Input: training set St, base categorization module WL, over-sampling rate M, undersampling rate N;
Output: disaggregated model H (x);
Step 1, initialization data concentrate the weight of sample:
D1(i)=l/m;
Step 2, after carrying out the processing of SMOTE over-sampling to minority class according to over-sampling rate M, in the case where keeping data distribution
Random lack sampling processing is carried out to entire data set with undersampling rate N, generates training dataset St', weight distribution Dt′;
Step 3, fort=1toT;
(1) according to training dataset St' and its weight distribution Dt', training weak typing module WL, and calculate weak hypothesis ht: X × Y →
[0,1];
(2) h is calculatedtPseudo- loss:
(3) weight undated parameter is calculated:
ωt=(1/2) (1+ht(xi, yi)-ht(xi, y));
(4) weight distribution D is updatedt
(5) normalized:
Step 4 obtains final classification model by T weak hypothesis weight votes:
3. the fuel price information control methods based on big data as described in claim 1, which is characterized in that acquisition oil price data,
It carries out during judging whether to be true oil price, abnormal oil price data is identified, carried out using DBSCAN algorithm
Identification, specifically includes:
Step 1, checks the data object P not being accessed in data and this data object does not process, and checks its
The field Eps NEps(p), if its NEps(p) the data object number for including in field is greater than or equal to Minpts, then establishes new cluster
C, and the data object for including in P and its field is incorporated in C;
Step 2 checks its field Eps N if there is not processed data object Q in CEps(p), if its N of fruitEps(p) it leads
The data object for including in domain is more than or equal to Minpts, and the point for including in Q and its field is substituted into C;
Step 3 repeats step 2, until the object in C is all processed;
Step 4 repeats step 1 to step 3 until all data objects are all accessed, and all data objects are all marked
It is denoted as some cluster or is considered as abnormal data.
4. the fuel price information control methods based on big data as described in claim 1, which is characterized in that the oil price number of collection
According to being compared, during the distribution situation for obtaining oil price, it is compared, is specifically included using K-Means;
Step 1 is based on formula
Text similarity is calculated, matrix M is constructed;
Step 2 is based on formula
P={ a1, a2..., an};
WhereinA set P is constructed, and it is ranked up, is in the way of ascending order
It can;
Step 3 establishes initial center point set I and deletes collection, is disposed as empty set;
Step 4, maximum one is selected in set P corresponding to text djAs a central point, and it is added into initial
Central point is concentrated, i.e. I=I ∪ { dj};
Step 5, by matrix M with text djText similarity reach certain value (sim (di, dj) > β) and all texts put
To Delete (Delete=De-lete ∪ { ai), and deleted from set P, i.e. P=P- { ai};
Step 6 judges whether P=φ, i < k, if condition be it is true, by the data cover in De-lete into P, i.e. P=
Delete;
Step 7, unless meeting termination condition, i.e. otherwise i=k repeats step 3 to step 6, finally obtains k initially
Cluster centre point;
Step 8 calculates the centered text of each class cluster and the text similarity of other texts according to cosine formula, according to similar
Size is spent, will be put into the cluster with the maximum text of class cluster center similarity;
Step 9 calculatesAgain the center of each class cluster is obtained;
Otherwise step 10 repeats step 8 to step 9 unless meeting termination condition.
5. the fuel price information control methods based on big data as described in claim 1, which is characterized in that believe the oil price of acquisition
Breath and during oil price distributed data stored establishes record buffer memory in advance for the tables of data in database, the record buffer memory with
Data line is that unit carries out reading and writing data;
When receiving the data inquiry request of client, requested data are searched in the record buffer memory;
If searching failure, requested data are searched in the caching of page of the database, the caching of page is basic with page
Unit carries out reading and writing data;
The data found in the record buffer memory or the caching of page are back to client;
Data are added into the record buffer memory, specifically, the data found in caching of page are added in record buffer memory.
6. the fuel price information control methods based on big data as claimed in claim 5, which is characterized in that Xiang Suoshu record buffer memory
It is middle addition data process include:
In record buffer memory, select with data to be added there are the record data of same order to be replaced.
7. the fuel price information control methods based on big data as claimed in claim 5, which is characterized in that Xiang Suoshu record buffer memory
The process of middle addition data further comprises: in record buffer memory, selection has the record of different number grade with data to be added
Caching page recycles the occupied space of the caching page, and utilizing recycled space is that the data to be added distribute new record
The new record buffer memory page is written in the data to be added by caching page.
8. the fuel price information control methods based on big data as claimed in claim 5, which is characterized in that
Obtain with the data to be added have same order record data access frequency Frec and with number to be added
According to the access frequency Fpage of the record buffer memory page with different number grade;
Judge whether Frec > replace_page_ratio*Fpage is true, if it is, side described in selection claim 6
Otherwise formula selects claim 7 mode;
Wherein replace_page_ratio be preset replacement control parameter, replace_page_ratio ∈ (0,1];
The preparation method of the access frequency Fpage of the record buffer memory page with data to be added with different number grade are as follows:
Fpage=(Fmin+Fmax)/2*N;
Wherein, Fmin is the access frequency for the data that timestamp is earliest in the record buffer memory page, and Fmax is in the record buffer memory page
The access frequency of the data of timestamp the latest, N are the data record total amount of the record buffer memory page.
9. a kind of fuel price information comparison system based on big data, which is characterized in that the fuel price information based on big data
Comparison system is provided with
Network data catching pattern is connect with big data processing module, is carried out using internet hunt to oil price targeted
Data grabber, and the classification of data is carried out with screening criteria according to certain rules;
Log collection module is connect with big data processing module, using the sale journal file in the website in each petrol station into
The collection of row data;
Subscriber network behavioral data collection module, connect with big data processing module, is struck a bargain according to oil price of the user on network
The data of oil price are collected in behavior;
Data analysis module is connect with big data processing module, is compared to the oil price data of collection, is obtained the distribution of oil price
Situation;
Data storage module is connect with big data processing module, and the various oil price data collected using reservoir storage are counted
According to classified storage;
As a result output module is connect with big data processing module, utilizes the distribution situation of display screen output oil price;
Big data processing module, with network data catching pattern, log collection module, subscriber network behavioral data collection module,
The work fortune of modules is coordinated in data analysis module, data storage module, result output module, the connection of data judgment module
Row;
Data judgment module is connect with big data processing module, to the oil price data of acquisition, judge whether it being true oil
Valence.
10. a kind of oil price letter based on big data for carrying the fuel price information comparison system based on big data described in claim 9
Breath comparison platform.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811602285.0A CN109947937A (en) | 2018-12-26 | 2018-12-26 | A kind of fuel price information comparison system and method based on big data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811602285.0A CN109947937A (en) | 2018-12-26 | 2018-12-26 | A kind of fuel price information comparison system and method based on big data |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109947937A true CN109947937A (en) | 2019-06-28 |
Family
ID=67007205
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811602285.0A Pending CN109947937A (en) | 2018-12-26 | 2018-12-26 | A kind of fuel price information comparison system and method based on big data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109947937A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115452084A (en) * | 2022-08-30 | 2022-12-09 | 中国船舶集团有限公司系统工程研究院 | Method for determining stable oiling flow based on clustering algorithm |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102331986A (en) * | 2010-07-12 | 2012-01-25 | 阿里巴巴集团控股有限公司 | Database cache management method and database server |
CN102984264A (en) * | 2012-12-06 | 2013-03-20 | 苏州工业园区服务外包职业学院 | Gas station information system based on cloud service |
US20130175030A1 (en) * | 2012-01-10 | 2013-07-11 | Adunola Ige | Submersible Pump Control |
CN104634350A (en) * | 2013-11-14 | 2015-05-20 | 北京四维图新科技股份有限公司 | Method and device for inquiring gas station information as well as navigation terminal |
CN107526773A (en) * | 2017-07-14 | 2017-12-29 | 江苏更知电子科技有限公司 | A kind of search system of gas station |
-
2018
- 2018-12-26 CN CN201811602285.0A patent/CN109947937A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102331986A (en) * | 2010-07-12 | 2012-01-25 | 阿里巴巴集团控股有限公司 | Database cache management method and database server |
US20130175030A1 (en) * | 2012-01-10 | 2013-07-11 | Adunola Ige | Submersible Pump Control |
CN102984264A (en) * | 2012-12-06 | 2013-03-20 | 苏州工业园区服务外包职业学院 | Gas station information system based on cloud service |
CN104634350A (en) * | 2013-11-14 | 2015-05-20 | 北京四维图新科技股份有限公司 | Method and device for inquiring gas station information as well as navigation terminal |
CN107526773A (en) * | 2017-07-14 | 2017-12-29 | 江苏更知电子科技有限公司 | A kind of search system of gas station |
Non-Patent Citations (3)
Title |
---|
孟静等: "一种基于聚类和快速计算的异常数据挖掘算法", 《计算机工程》 * |
朱晓峰等: "基于微博舆情监测的K-Means算法改进研究", 《信息系统》 * |
李克文等: "基于RSBoost算法的不平衡数据分类方法", 《计算机科学》 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115452084A (en) * | 2022-08-30 | 2022-12-09 | 中国船舶集团有限公司系统工程研究院 | Method for determining stable oiling flow based on clustering algorithm |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107424043B (en) | Product recommendation method and device and electronic equipment | |
Yin et al. | Joint modeling of user check-in behaviors for real-time point-of-interest recommendation | |
US8346770B2 (en) | Systems and methods for clustering search results | |
CA2610319C (en) | Scoring local search results based on location prominence | |
CN101520784B (en) | Information issuing system and information issuing method | |
KR101700352B1 (en) | Generating improved document classification data using historical search results | |
CN107862553A (en) | Advertisement real-time recommendation method, device, terminal device and storage medium | |
US10691765B1 (en) | Personalized search results | |
US8380693B1 (en) | System and method for automatically identifying classified websites | |
WO2005050513A1 (en) | On-line advertising system and method | |
CN111861296A (en) | Piece collecting task allocation method and device, piece collecting system, equipment and medium | |
CN110428282A (en) | Information query method and device based on gas station | |
CN108932646A (en) | User tag verification method, device and electronic equipment based on operator | |
CN109978264B (en) | Urban population distribution prediction method based on spatio-temporal information | |
CN109947937A (en) | A kind of fuel price information comparison system and method based on big data | |
CN108256064B (en) | A kind of data search method and device | |
CN116595262A (en) | Travel scheme recommendation method and device, electronic equipment and computer storage medium | |
CN113486247B (en) | Internet online identification and reading document reading hierarchical management system | |
CN113792116B (en) | Multi-vertical-domain multi-intention hierarchical judgment method and system based on search word semantics | |
CN112506930B (en) | Data insight system based on machine learning technology | |
Ivancsy et al. | Clustering techniques utilized in web usage mining | |
Ouyang et al. | critical factors for B2C e-commerce in China | |
CN116757198A (en) | Text ordering method, equipment and storage medium based on text information similarity | |
CN116188102A (en) | E-commerce platform management system based on Internet | |
CN113779379A (en) | User portrait based house source pushing method, device and equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB03 | Change of inventor or designer information |
Inventor after: Cheng Guogen Inventor after: Li Xinjie Inventor before: Cheng Guogen Inventor before: Li Xinran |
|
CB03 | Change of inventor or designer information | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190628 |
|
RJ01 | Rejection of invention patent application after publication |