CN109947937A - A kind of fuel price information comparison system and method based on big data - Google Patents

A kind of fuel price information comparison system and method based on big data Download PDF

Info

Publication number
CN109947937A
CN109947937A CN201811602285.0A CN201811602285A CN109947937A CN 109947937 A CN109947937 A CN 109947937A CN 201811602285 A CN201811602285 A CN 201811602285A CN 109947937 A CN109947937 A CN 109947937A
Authority
CN
China
Prior art keywords
data
oil price
module
buffer memory
page
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811602285.0A
Other languages
Chinese (zh)
Inventor
程国艮
李欣然
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chinese Translation Language Through Polytron Technologies Inc
Original Assignee
Chinese Translation Language Through Polytron Technologies Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chinese Translation Language Through Polytron Technologies Inc filed Critical Chinese Translation Language Through Polytron Technologies Inc
Priority to CN201811602285.0A priority Critical patent/CN109947937A/en
Publication of CN109947937A publication Critical patent/CN109947937A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention belongs to big data fields, a kind of fuel price information comparison system and method based on big data is disclosed, network data catching pattern, log collection module, subscriber network behavioral data collection module, data analysis module, data storage module, data judgment module, big data processing module, data judgment module are provided with;Big data processing module connect with modules and coordinates the work operation of modules.The present invention utilizes information terminal module, completes the inquiry comparison of oil price;Message processing module can carry out comprehensive analysis to gas stations all in a certain range in conjunction with oil price, road conditions, distance and fuel consumption information;User can use information terminal module and send inquiry instruction and receive fuel price information;Oil price comparison can be efficiently accomplished, the various aspects of gas station are accurately analyzed, user is helped to make optimal selection.

Description

A kind of fuel price information comparison system and method based on big data
Technical field
The invention belongs to big data field, in particular to a kind of fuel price information comparison system and method based on big data.
Background technique
Currently, the prior art commonly used in the trade is such that
Automobile has become the main walking-replacing tool of people, and main fuel of the gasoline as automobile, price are also to have One of most concerned problem of vehicle family, or even using fuel consumption as influence factor when buying car.And petroleum can not be again as one kind Production-goods source determines the high of its price, the topic that oil price is also continued saying it with interest at people.Oil price rises steadily, and And people are continuously increased the demand of gasoline also with increasing for automobile.So one quality-high and inexpensive oiling of selection Station is the problem that current car owner is concerned about very much.
Existing fuel price information comparison system can not be in conjunction with oil price, road conditions, distance and fuel consumption information, to certain model It encloses interior all gas stations and carries out comprehensive analysis, cause user that cannot send inquiry instruction using information terminal module and receive oil price Information carries out oil price comparison, carries out comprehensive accurate analysis to gas station, prevents user from the selection made.In the prior art Data are judged, using current algorithm, identification cannot be rapidly performed by abnormal data, increase judgement when Between.Reservoir carries out processing storage to unbalanced dataset in the prior art, using traditional algorithm, not can be reduced for training Data and reduce data set scale, extend the model training time, reduce the classification effectiveness of algorithm.It is right in the prior art The oil price data of collection are compared, and during the distribution situation for obtaining oil price, using traditional algorithm, cannot effectively mention Height improves the stability of clustering precision and cluster, reduces the distributed mass of oil price data.
In conclusion problem of the existing technology is:
(1) existing fuel price information comparison system can not be in conjunction with oil price, road conditions, distance and fuel consumption information, to one Determine all gas stations in range and carry out comprehensive analysis, cause user that cannot send inquiry instruction using information terminal module and receives Fuel price information carries out oil price comparison, carries out comprehensive accurate analysis to gas station, prevents user from the selection made.
(2) data are judged in the prior art, using current algorithm, abnormal data cannot be rapidly performed by Identification, increases the time of judgement.
(3) reservoir carries out processing storage to unbalanced dataset in the prior art, using traditional algorithm, not can be reduced For the scale of trained data and reduction data set, the model training time is extended, the classification effectiveness of algorithm is reduced.
(4) the oil price data of collection are compared in the prior art, during the distribution situation for obtaining oil price, are used Traditional algorithm cannot effectively improve the stability for improving clustering precision and cluster, reduce the distributed mass of oil price data.
Summary of the invention
In view of the problems of the existing technology, the fuel price information comparison system that the present invention provides a kind of based on big data and Method.
The invention is realized in this way a kind of fuel price information control methods based on big data includes:
Step 1, firstly, obtaining the data information of oil price using network, log, user network behavior;
Step 2 is analyzed according to the oil price data information of acquisition, to the oil price data of acquisition, is judged whether It is true oil price, judges not to be true oil price, deviate normal oil price range, oil price is invalid, without handling and storing up Deposit output;
Step 3 is compared the oil price data of collection, obtains the distribution situation of oil price;
Step 4, fuel price information and oil price distributed data to acquisition store, and show oil price by display screen Distribution results.
Further, the various oil price data collected using reservoir storage, carry out data classification storage, to entire data set The data that lack sampling processing reduces training are carried out in the case where keeping distribution, are reduced the scale of data set, are carried out data classification It is middle that unbalanced data classification is carried out using RSBoost algorithm, it specifically includes:
Given training set S={ (x1, y1), (x2, y2) ..., (xm, ym), sample xi∈XdIt is d dimensional feature vector, class label yi∈ { P, N }, wherein P corresponds to minority class, the corresponding most classes of N;
Input: training set St, base categorization module WL, over-sampling rate M, undersampling rate N;
Output: disaggregated model H (x);
Step 1, initialization data concentrate the weight of sample:
D1(i)=1/m;
Step 2, after carrying out the processing of SMOTE over-sampling to minority class according to over-sampling rate M, in the feelings for keeping data distribution Random lack sampling processing is carried out to entire data set with undersampling rate N under condition, generates training dataset St', weight distribution is Dt′;
Step 3, for t=1to T;
(1) according to training dataset St' and its weight distribution Dt', training weak typing module WL, and calculate weak hypothesis ht: X × Y → [0,1];
(2) h is calculatedtPseudo- loss:
(3) weight undated parameter is calculated:
ωt=(1/2) (1+ht(xi, yi)-ht(xi, y));
(4) weight distribution D is updatedt
(5) normalized:
Step 4 obtains final classification model by T weak hypothesis weight votes:
Further, oil price data are acquired, are carried out during judging whether to be true oil price, to abnormal oil price data It is identified, is identified, specifically included using DBSCAN algorithm:
Step 1, checks the data object P not being accessed in data and this data object does not process, and checks it The field Eps NEps(p), if its NEps(p) the data object number for including in field is greater than or equal to Minpts, then establishes new Cluster C, and the data object for including in P and its field is incorporated in C;
Step 2 checks its field Eps N if there is not processed data object Q in CEps(p), if its N of fruitEps (p) data object for including in field is more than or equal to Minpts, and the point for including in Q and its field is substituted into C;
Step 3 repeats step 2, until the object in C is all processed;
Step 4 repeats step 1 to step 3 until all data objects are all accessed, and all data objects It is collectively labeled as some cluster or is considered as abnormal data.
Further, the oil price data of collection are compared, during the distribution situation for obtaining oil price, using K- Means is compared, and specifically includes;
Step 1 is based on formula
Text similarity is calculated, matrix M is constructed;
Step 2 is based on formula
P={ a1, a2..., an};
WhereinA set P is constructed, and it is ranked up, according to ascending order Mode;
Step 3 establishes initial center point set I and deletes collection, is disposed as empty set;
Step 4, maximum one is selected in set P corresponding to text djAs a central point, and it is added into Initial center point is concentrated, i.e. I=I ∪ { dj};
Step 5, by matrix M with text djText similarity reach certain value (sim (di, dj) > β) and all texts Originally it is put into Delete (Delete=De-lete ∪ { ai), and deleted from set P, i.e. P=P- { ai};
Step 6 judges whether P=φ, i < k, if condition be it is true, by the data cover in De-lete into P, That is P=Delete;
Step 7, unless meeting termination condition, i.e. otherwise i=k repeats step 3 to step 6, finally obtains k Initial cluster center point;
Step 8 calculates the centered text of each class cluster and the text similarity of other texts according to cosine formula, according to Similarity size will be put into the cluster with the maximum text of class cluster center similarity;
Step 9 calculatesAgain the center of each class cluster is obtained;
Otherwise step 10 repeats step 8 to step 9 unless meeting termination condition.
It further, is in advance the data in database in storing to the fuel price information of acquisition and oil price distributed data Table establishes record buffer memory, and the record buffer memory carries out reading and writing data as unit of data line;
When receiving the data inquiry request of client, requested data are searched in the record buffer memory;
If searching failure, requested data are searched in the caching of page of the database, the caching of page is with page Basic unit carries out reading and writing data;
The data found in the record buffer memory or the caching of page are back to client;
Data are added into the record buffer memory, specifically, it is slow that the data found in caching of page are added to record In depositing;
Further, the process of addition data includes: in Xiang Suoshu record buffer memory
In record buffer memory, select with data to be added there are the record data of same order to be replaced;
Further, in Xiang Suoshu record buffer memory add data process further comprise: in record buffer memory, selection with to The record buffer memory page that data have different number grade is added, the occupied space of the caching page is recycled, utilizes recycled space New record buffer memory page is distributed for the data to be added, and the new record buffer memory page is written into the data to be added.
Further, obtain with the data to be added have same order record data access frequency Frec, with And there is the access frequency Fpage of the record buffer memory page of different number grade with data to be added;
Judge whether Frec > replace_page_ratio*Fpage is true, if it is, select in record buffer memory, Select with data to be added there are the record data of same order to be replaced;Otherwise select in record buffer memory, selection with Data to be added have the record buffer memory page of different number grade, recycle the occupied space of the caching page, utilize recycled sky Between be that the data to be added distribute new record buffer memory page, the new record buffer memory page is written into the data to be added.
Wherein replace_page_ratio be preset replacement control parameter, replace_page_ratio ∈ (0,1];
The preparation method of the access frequency Fpage of the record buffer memory page with data to be added with different number grade Are as follows:
Fpage=(Fmin+Fmax)/2*N;
Wherein, Fmin is the access frequency for the data that timestamp is earliest in the record buffer memory page, and Fmax is the record buffer memory The access frequency of the data of timestamp the latest in page, N are the data record total amount of the record buffer memory page.
Another object of the present invention is to provide a kind of fuel price information comparison system based on big data, it is described based on big The fuel price information comparison system of data is provided with
Network data catching pattern is connect with big data processing module, be directed to oil price using internet hunt Property data grabber, and according to certain rules with screening criteria carry out data classification;
Log collection module is connect with big data processing module, utilizes the sale log text in the website in each petrol station The collection of part progress data;
Subscriber network behavioral data collection module, connect with big data processing module, according to oil price of the user on network The data of oil price are collected in conclusion of the business behavior;
Data analysis module is connect with big data processing module, is compared to the oil price data of collection, is obtained oil price Distribution situation;
Data storage module is connect with big data processing module, the various oil price data collected using reservoir storage, into Row data classified storage;
As a result output module is connect with big data processing module, utilizes the distribution situation of display screen output oil price;
Big data processing module collects mould with network data catching pattern, log collection module, subscriber network behavioral data Block, data analysis module, data storage module, result output module, the connection of data judgment module, coordinate the work of modules Operation;
Data judgment module is connect with big data processing module, to the oil price data of acquisition, judge whether being true Oil price.
Another object of the present invention is to provide a kind of bases of the fuel price information comparison system described in carrying based on big data Platform is compared in the fuel price information of big data.
The various oil price data that the data storage module is collected using reservoir storage, carry out the mistake of data classification storage Cheng Zhong adjusts the degree of balance of unbalanced dataset to increase minority class data bulk, so that equilibrium data is distributed;To entire Data set carries out lack sampling processing in the case where keeping distribution to reduce the data for training, reduces the scale of data set, So as to reduce the model training time, the classification effectiveness of algorithm is improved.
In conclusion advantages of the present invention and good effect are as follows:
The present invention utilizes information terminal module, completes the inquiry comparison of oil price.Message processing module can combine oil price, road Condition, distance and fuel consumption information carry out comprehensive analysis to gas stations all in a certain range.User can use information terminal Module sends inquiry instruction and receives fuel price information;Oil price comparison can be efficiently accomplished, the various aspects of gas station are carried out quasi- Really analysis helps user to make optimal selection.
The oil price data that data judgment module acquires in the present invention, carry out during judging whether to be true oil price, In order to be rapidly performed by identification to abnormal oil price data, the time of judgement is reduced, using DBSCAN algorithm.
The various oil price data that data storage module is collected using reservoir storage in the present invention, carry out data classification storage During, the degree of balance of unbalanced dataset is adjusted in order to increase minority class data bulk, so that equilibrium data is distributed;It is right Entire data set carries out lack sampling processing in the case where keeping distribution to reduce the data for training, reduces the rule of data set Mould improves the classification effectiveness of algorithm so as to reduce the model training time, using the uneven number based on RSBoost algorithm According to classification method.
The various oil price data that data storage module is collected using reservoir storage in the present invention, carry out data classification storage During, the degree of balance of unbalanced dataset is adjusted in order to increase minority class data bulk, so that equilibrium data is distributed;It is right Entire data set carries out lack sampling processing in the case where keeping distribution to reduce the data for training, reduces the rule of data set Mould improves the classification effectiveness of algorithm so as to reduce the model training time, using the uneven number based on RSBoost algorithm According to classification method.
The oil price data of collection are compared in data analysis module in the present invention, obtain the process of the distribution situation of oil price In, in order to improve the stability of clustering precision and cluster, the distributed mass of oil price data is improved, using K-Means.In initial It should be noted that successively to select the maximum text of similarity according to average text similarity ordering scenario, only in heart selection It just can guarantee that the central point selected has biggish correlation with the data in data set in this way, can more preferably represent a part of number According to, ensure that central point distribution it is uniform.
During fuel price information and oil price distributed data to acquisition store, note is established for the tables of data in database in advance Record caching, the record buffer memory carry out reading and writing data as unit of data line;
When receiving the data inquiry request of client, requested data are searched in the record buffer memory;
If searching failure, requested data are searched in the caching of page of the database, the caching of page is with page Basic unit carries out reading and writing data;
The data found in the record buffer memory or the caching of page are back to client;
Data are added into the record buffer memory, specifically, it is slow that the data found in caching of page are added to record In depositing;
The process that the present invention adds data into the record buffer memory includes:
In record buffer memory, select with data to be added there are the record data of same order to be replaced;
The process that data are added into the record buffer memory further comprises: in record buffer memory, selection and number to be added According to the record buffer memory page with different number grade, the occupied space of the caching page is recycled, it is described for utilizing recycled space Data to be added distribute new record buffer memory page, and the new record buffer memory page is written in the data to be added.
Obtain with the data to be added with same order record data access frequency Frec and with wait add Addend evidence has the access frequency Fpage of the record buffer memory page of different number grade;
Judge whether Frec > replace_page_ratio*Fpage is true, if it is, select in record buffer memory, Select with data to be added there are the record data of same order to be replaced;Otherwise select in record buffer memory, selection with Data to be added have the record buffer memory page of different number grade, recycle the occupied space of the caching page, utilize recycled sky Between be that the data to be added distribute new record buffer memory page, the new record buffer memory page is written into the data to be added.
Wherein replace_page_ratio be preset replacement control parameter, replace_page_ratio ∈ (0,1]. The real-time storage of data can be achieved.
Detailed description of the invention
Fig. 1 is the fuel price information comparison system structural schematic diagram provided in an embodiment of the present invention based on big data.
Fig. 2 is the fuel price information control methods flow chart provided in an embodiment of the present invention based on big data.
In figure: 1, network data catching pattern;2, log collection module;3, subscriber network behavioral data collection module;4, Data analysis module;5, data storage module;6, data judgment module;7, big data processing module;8, data judgment module.
Specific embodiment
In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to embodiments, to the present invention It is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, it is not used to Limit the present invention.
Application principle of the invention is explained in detail with reference to the accompanying drawing.
As shown in Figure 1, the fuel price information comparison system provided in an embodiment of the present invention based on big data includes: network data Collection module 1, log collection module 2, subscriber network behavioral data collection module 3, data analysis module 4, data storage module 5, data judgment module 6, big data processing module 7, data judgment module 8.
Network data catching pattern 1 is connect with big data processing module 7, has carried out needle to oil price using internet hunt To the data grabber of property, and the classification of data is carried out with screening criteria according to certain rules;
Log collection module 2 is connect with big data processing module 7, utilizes the sale log in the website in each petrol station The collection of file progress data;
Subscriber network behavioral data collection module 3, connect with big data processing module 7, according to oil of the user on network Valence conclusion of the business behavior, collects the data of oil price;
Data analysis module 4 is connect with big data processing module 7, is compared to the oil price data of collection, is obtained oil price Distribution situation;
Data storage module 5 is connect with big data processing module 7, the various oil price data collected using reservoir storage, Carry out data classification storage;
As a result output module 6 are connect with big data processing module 7, utilize the distribution situation of display screen output oil price;
Big data processing module 7 is received with network data catching pattern 1, log collection module 2, subscriber network behavioral data Collect module 3, data analysis module 4, data storage module 5, result output module 6, data judgment module 8 to connect, coordinate each The work of module is run;
Data judgment module 8 is connect with big data processing module 7, to the oil price data of acquisition, judge whether being true Real oil price.
As shown in Fig. 2, the fuel price information control methods provided in an embodiment of the present invention based on big data, specifically includes following Step:
S101: firstly, using network, log, user network behavior, the data information of oil price is obtained;
S102: according to the oil price data information of acquisition, being analyzed, to the oil price data of acquisition, judge whether be True oil price such as judges not being true oil price, and it is then that the oil price is invalid that it is many, which to deviate normal oil price range, not into Row processing and storage output;
S103: the oil price data of collection are compared, obtain the distribution situation of oil price;
S104: fuel price information and oil price distributed data to acquisition store, and show oil price by display screen Distribution results.
The various oil price data that the data storage module 5 is collected using reservoir storage, carry out data classification storage In the process, the degree of balance of unbalanced dataset is adjusted in order to increase minority class data bulk, so that equilibrium data is distributed;To whole A data set carries out lack sampling processing in the case where keeping distribution to reduce the data for training, reduces the rule of data set Mould improves the classification effectiveness of algorithm so as to reduce the model training time, using the uneven number based on RSBoost algorithm According to classification method, detailed process is as follows:
Given training set S={ (x1, y1), (x2, y2) ..., (xm, ym), sample xi∈XdIt is d dimensional feature vector, class label yi∈ { P, N }, wherein P corresponds to minority class, the corresponding most classes of N;
Input: training set Si, base categorization module WL, over-sampling rate M, undersampling rate N;
Output: disaggregated model H (x);
Step 1, initialization data concentrate the weight of sample:
D1(i)=1/m;
Step 2, after carrying out the processing of SMOTE over-sampling to minority class according to over-sampling rate M, in the feelings for keeping data distribution Random lack sampling processing is carried out to entire data set with undersampling rate N under condition, generates training dataset St', weight distribution is Dt′;
Step 3, for t=1to T;
(1) according to training dataset St' and its weight distribution Dt', training weak typing module WL, and calculate weak hypothesis ht: X × Y → [0,1];
(2) h is calculatedtPseudo- loss:
(3) weight undated parameter is calculated:
ωt=(1/2) (1+ht(xi, yi)-ht(xi, y));
(4) weight distribution D is updatedt
(5) normalized:
Step 4 obtains final classification model by T weak hypothesis weight votes:
The oil price data that the data judgment module 8 acquires, carry out during judging whether to be true oil price, in order to Identification is rapidly performed by abnormal oil price data, the time of judgement is reduced, using DBSCAN algorithm, specifically includes following step It is rapid:
Step 1, checks the data object P not being accessed in data and this data object does not process, and checks it The field Eps NEps(p), if its NEps(p) the data object number for including in field is greater than or equal to Minpts, then establishes new Cluster C, and the data object for including in P and its field is incorporated in C;
Step 2 checks its field Eps N if there is not processed data object Q in CEps(p), if its N of fruitEps (p) data object for including in field is more than or equal to Minpts, and the point for including in Q and its field is substituted into C;
Step 3 repeats step 2, until the object in C is all processed;
Step 4 repeats step 1 to step 3 until all data objects are all accessed, and all data objects It is collectively labeled as some cluster or is considered as abnormal data.
The oil price data of 4 pairs of data analysis module collections are compared, during the distribution situation for obtaining oil price, In order to improve the stability of clustering precision and cluster, improve the distributed mass of oil price data, using K-Means, specifically include with Lower step;
Step 1 is based on formula
Text similarity is calculated, matrix M is constructed;
Step 2 is based on formula
P={ a1, a2..., an};
WhereinA set P is constructed, and it is ranked up, according to ascending order Mode;
Step 3 establishes initial center point set I and deletes collection, is disposed as empty set;
Step 4, maximum one is selected in set P corresponding to text djAs a central point, and it is added into Initial center point is concentrated, i.e. I=I ∪ { dj};
Step 5, by matrix M with text djText similarity reach certain value (sim (di, dj) > β) and all texts Originally it is put into Delete (Delete=De-lete ∪ { ai), and deleted from set P, i.e. P=P- { ai};
Step 6 judges whether P=φ, i < k, if condition be it is true, by the data cover in De-lete into P, That is P=Delete;
Step 7, unless meeting termination condition, i.e. otherwise i=k repeats step 3 to step 6, finally obtains k Initial cluster center point;
Step 8 calculates the centered text of each class cluster and the text similarity of other texts according to cosine formula, according to Similarity size will be put into the cluster with the maximum text of class cluster center similarity;
Step 9 calculatesAgain the center of each class cluster is obtained;
Otherwise step 10 repeats step 8 to step 9 unless meeting termination condition.
It in embodiments of the present invention, is in advance data in storing to the fuel price information of acquisition and oil price distributed data Tables of data in library establishes record buffer memory, and the record buffer memory carries out reading and writing data as unit of data line;
When receiving the data inquiry request of client, requested data are searched in the record buffer memory;
If searching failure, requested data are searched in the caching of page of the database, the caching of page is with page Basic unit carries out reading and writing data;
The data found in the record buffer memory or the caching of page are back to client;
Data are added into the record buffer memory, specifically, it is slow that the data found in caching of page are added to record In depositing;
In embodiments of the present invention, the process of addition data includes: in Xiang Suoshu record buffer memory
In record buffer memory, select with data to be added there are the record data of same order to be replaced;
The process that data are added into the record buffer memory further comprises: in record buffer memory, selection and number to be added According to the record buffer memory page with different number grade, the occupied space of the caching page is recycled, it is described for utilizing recycled space Data to be added distribute new record buffer memory page, and the new record buffer memory page is written in the data to be added.
Obtain with the data to be added with same order record data access frequency Frec and with wait add Addend evidence has the access frequency Fpage of the record buffer memory page of different number grade;
Judge whether Frec > replace_page_ratio*Fpage is true, if it is, select in record buffer memory, Select with data to be added there are the record data of same order to be replaced;Otherwise select in record buffer memory, selection with Data to be added have the record buffer memory page of different number grade, recycle the occupied space of the caching page, utilize recycled sky Between be that the data to be added distribute new record buffer memory page, the new record buffer memory page is written into the data to be added.
Wherein replace_page_ratio be preset replacement control parameter, replace_page_ratio ∈ (0,1];
The preparation method of the access frequency Fpage of the record buffer memory page with data to be added with different number grade Are as follows:
Fpage=(Fmin+Fmax)/2*N;
Wherein, Fmin is the access frequency for the data that timestamp is earliest in the record buffer memory page, and Fmax is the record buffer memory The access frequency of the data of timestamp the latest in page, N are the data record total amount of the record buffer memory page.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all in essence of the invention Made any modifications, equivalent replacements, and improvements etc., should all be included in the protection scope of the present invention within mind and principle.

Claims (10)

1. a kind of fuel price information control methods based on big data, which is characterized in that the fuel price information pair based on big data Ratio method includes:
Step 1, firstly, obtaining the data information of oil price using network, log, user network behavior;
Step 2 is analyzed according to the oil price data information of acquisition, to the oil price data of acquisition, judge whether being true Real oil price, judges not to be true oil price, deviate normal oil price range, oil price is invalid, defeated without handling and storing Out;
Step 3 is compared the oil price data of collection, obtains the distribution situation of oil price;
Step 4, fuel price information and oil price distributed data to acquisition store, and point of oil price is shown by display screen Cloth result.
2. the fuel price information control methods based on big data as described in claim 1, which is characterized in that stored using reservoir The various oil price data collected carry out data classification storage, carry out lack sampling in the case where keeping and being distributed to entire data set Processing reduces the data of training, reduces the scale of data set, carries out carrying out uneven number using RSBoost algorithm in data classification According to classification, specifically include:
Given training set S={ (x1, y1), (x2, y2) ..., (xm, ym), sample xi∈XdIt is d dimensional feature vector, class marks yi∈ { P, N }, wherein P corresponds to minority class, the corresponding most classes of N;
Input: training set St, base categorization module WL, over-sampling rate M, undersampling rate N;
Output: disaggregated model H (x);
Step 1, initialization data concentrate the weight of sample:
D1(i)=l/m;
Step 2, after carrying out the processing of SMOTE over-sampling to minority class according to over-sampling rate M, in the case where keeping data distribution Random lack sampling processing is carried out to entire data set with undersampling rate N, generates training dataset St', weight distribution Dt′;
Step 3, fort=1toT;
(1) according to training dataset St' and its weight distribution Dt', training weak typing module WL, and calculate weak hypothesis ht: X × Y → [0,1];
(2) h is calculatedtPseudo- loss:
(3) weight undated parameter is calculated:
ωt=(1/2) (1+ht(xi, yi)-ht(xi, y));
(4) weight distribution D is updatedt
(5) normalized:
Step 4 obtains final classification model by T weak hypothesis weight votes:
3. the fuel price information control methods based on big data as described in claim 1, which is characterized in that acquisition oil price data, It carries out during judging whether to be true oil price, abnormal oil price data is identified, carried out using DBSCAN algorithm Identification, specifically includes:
Step 1, checks the data object P not being accessed in data and this data object does not process, and checks its The field Eps NEps(p), if its NEps(p) the data object number for including in field is greater than or equal to Minpts, then establishes new cluster C, and the data object for including in P and its field is incorporated in C;
Step 2 checks its field Eps N if there is not processed data object Q in CEps(p), if its N of fruitEps(p) it leads The data object for including in domain is more than or equal to Minpts, and the point for including in Q and its field is substituted into C;
Step 3 repeats step 2, until the object in C is all processed;
Step 4 repeats step 1 to step 3 until all data objects are all accessed, and all data objects are all marked It is denoted as some cluster or is considered as abnormal data.
4. the fuel price information control methods based on big data as described in claim 1, which is characterized in that the oil price number of collection According to being compared, during the distribution situation for obtaining oil price, it is compared, is specifically included using K-Means;
Step 1 is based on formula
Text similarity is calculated, matrix M is constructed;
Step 2 is based on formula
P={ a1, a2..., an};
WhereinA set P is constructed, and it is ranked up, is in the way of ascending order It can;
Step 3 establishes initial center point set I and deletes collection, is disposed as empty set;
Step 4, maximum one is selected in set P corresponding to text djAs a central point, and it is added into initial Central point is concentrated, i.e. I=I ∪ { dj};
Step 5, by matrix M with text djText similarity reach certain value (sim (di, dj) > β) and all texts put To Delete (Delete=De-lete ∪ { ai), and deleted from set P, i.e. P=P- { ai};
Step 6 judges whether P=φ, i < k, if condition be it is true, by the data cover in De-lete into P, i.e. P= Delete;
Step 7, unless meeting termination condition, i.e. otherwise i=k repeats step 3 to step 6, finally obtains k initially Cluster centre point;
Step 8 calculates the centered text of each class cluster and the text similarity of other texts according to cosine formula, according to similar Size is spent, will be put into the cluster with the maximum text of class cluster center similarity;
Step 9 calculatesAgain the center of each class cluster is obtained;
Otherwise step 10 repeats step 8 to step 9 unless meeting termination condition.
5. the fuel price information control methods based on big data as described in claim 1, which is characterized in that believe the oil price of acquisition Breath and during oil price distributed data stored establishes record buffer memory in advance for the tables of data in database, the record buffer memory with Data line is that unit carries out reading and writing data;
When receiving the data inquiry request of client, requested data are searched in the record buffer memory;
If searching failure, requested data are searched in the caching of page of the database, the caching of page is basic with page Unit carries out reading and writing data;
The data found in the record buffer memory or the caching of page are back to client;
Data are added into the record buffer memory, specifically, the data found in caching of page are added in record buffer memory.
6. the fuel price information control methods based on big data as claimed in claim 5, which is characterized in that Xiang Suoshu record buffer memory It is middle addition data process include:
In record buffer memory, select with data to be added there are the record data of same order to be replaced.
7. the fuel price information control methods based on big data as claimed in claim 5, which is characterized in that Xiang Suoshu record buffer memory The process of middle addition data further comprises: in record buffer memory, selection has the record of different number grade with data to be added Caching page recycles the occupied space of the caching page, and utilizing recycled space is that the data to be added distribute new record The new record buffer memory page is written in the data to be added by caching page.
8. the fuel price information control methods based on big data as claimed in claim 5, which is characterized in that
Obtain with the data to be added have same order record data access frequency Frec and with number to be added According to the access frequency Fpage of the record buffer memory page with different number grade;
Judge whether Frec > replace_page_ratio*Fpage is true, if it is, side described in selection claim 6 Otherwise formula selects claim 7 mode;
Wherein replace_page_ratio be preset replacement control parameter, replace_page_ratio ∈ (0,1];
The preparation method of the access frequency Fpage of the record buffer memory page with data to be added with different number grade are as follows:
Fpage=(Fmin+Fmax)/2*N;
Wherein, Fmin is the access frequency for the data that timestamp is earliest in the record buffer memory page, and Fmax is in the record buffer memory page The access frequency of the data of timestamp the latest, N are the data record total amount of the record buffer memory page.
9. a kind of fuel price information comparison system based on big data, which is characterized in that the fuel price information based on big data Comparison system is provided with
Network data catching pattern is connect with big data processing module, is carried out using internet hunt to oil price targeted Data grabber, and the classification of data is carried out with screening criteria according to certain rules;
Log collection module is connect with big data processing module, using the sale journal file in the website in each petrol station into The collection of row data;
Subscriber network behavioral data collection module, connect with big data processing module, is struck a bargain according to oil price of the user on network The data of oil price are collected in behavior;
Data analysis module is connect with big data processing module, is compared to the oil price data of collection, is obtained the distribution of oil price Situation;
Data storage module is connect with big data processing module, and the various oil price data collected using reservoir storage are counted According to classified storage;
As a result output module is connect with big data processing module, utilizes the distribution situation of display screen output oil price;
Big data processing module, with network data catching pattern, log collection module, subscriber network behavioral data collection module, The work fortune of modules is coordinated in data analysis module, data storage module, result output module, the connection of data judgment module Row;
Data judgment module is connect with big data processing module, to the oil price data of acquisition, judge whether it being true oil Valence.
10. a kind of oil price letter based on big data for carrying the fuel price information comparison system based on big data described in claim 9 Breath comparison platform.
CN201811602285.0A 2018-12-26 2018-12-26 A kind of fuel price information comparison system and method based on big data Pending CN109947937A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811602285.0A CN109947937A (en) 2018-12-26 2018-12-26 A kind of fuel price information comparison system and method based on big data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811602285.0A CN109947937A (en) 2018-12-26 2018-12-26 A kind of fuel price information comparison system and method based on big data

Publications (1)

Publication Number Publication Date
CN109947937A true CN109947937A (en) 2019-06-28

Family

ID=67007205

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811602285.0A Pending CN109947937A (en) 2018-12-26 2018-12-26 A kind of fuel price information comparison system and method based on big data

Country Status (1)

Country Link
CN (1) CN109947937A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115452084A (en) * 2022-08-30 2022-12-09 中国船舶集团有限公司系统工程研究院 Method for determining stable oiling flow based on clustering algorithm

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102331986A (en) * 2010-07-12 2012-01-25 阿里巴巴集团控股有限公司 Database cache management method and database server
CN102984264A (en) * 2012-12-06 2013-03-20 苏州工业园区服务外包职业学院 Gas station information system based on cloud service
US20130175030A1 (en) * 2012-01-10 2013-07-11 Adunola Ige Submersible Pump Control
CN104634350A (en) * 2013-11-14 2015-05-20 北京四维图新科技股份有限公司 Method and device for inquiring gas station information as well as navigation terminal
CN107526773A (en) * 2017-07-14 2017-12-29 江苏更知电子科技有限公司 A kind of search system of gas station

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102331986A (en) * 2010-07-12 2012-01-25 阿里巴巴集团控股有限公司 Database cache management method and database server
US20130175030A1 (en) * 2012-01-10 2013-07-11 Adunola Ige Submersible Pump Control
CN102984264A (en) * 2012-12-06 2013-03-20 苏州工业园区服务外包职业学院 Gas station information system based on cloud service
CN104634350A (en) * 2013-11-14 2015-05-20 北京四维图新科技股份有限公司 Method and device for inquiring gas station information as well as navigation terminal
CN107526773A (en) * 2017-07-14 2017-12-29 江苏更知电子科技有限公司 A kind of search system of gas station

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
孟静等: "一种基于聚类和快速计算的异常数据挖掘算法", 《计算机工程》 *
朱晓峰等: "基于微博舆情监测的K-Means算法改进研究", 《信息系统》 *
李克文等: "基于RSBoost算法的不平衡数据分类方法", 《计算机科学》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115452084A (en) * 2022-08-30 2022-12-09 中国船舶集团有限公司系统工程研究院 Method for determining stable oiling flow based on clustering algorithm

Similar Documents

Publication Publication Date Title
CN107424043B (en) Product recommendation method and device and electronic equipment
Yin et al. Joint modeling of user check-in behaviors for real-time point-of-interest recommendation
US8346770B2 (en) Systems and methods for clustering search results
CA2610319C (en) Scoring local search results based on location prominence
CN101520784B (en) Information issuing system and information issuing method
KR101700352B1 (en) Generating improved document classification data using historical search results
CN107862553A (en) Advertisement real-time recommendation method, device, terminal device and storage medium
US10691765B1 (en) Personalized search results
US8380693B1 (en) System and method for automatically identifying classified websites
WO2005050513A1 (en) On-line advertising system and method
CN111861296A (en) Piece collecting task allocation method and device, piece collecting system, equipment and medium
CN110428282A (en) Information query method and device based on gas station
CN108932646A (en) User tag verification method, device and electronic equipment based on operator
CN109978264B (en) Urban population distribution prediction method based on spatio-temporal information
CN109947937A (en) A kind of fuel price information comparison system and method based on big data
CN108256064B (en) A kind of data search method and device
CN116595262A (en) Travel scheme recommendation method and device, electronic equipment and computer storage medium
CN113486247B (en) Internet online identification and reading document reading hierarchical management system
CN113792116B (en) Multi-vertical-domain multi-intention hierarchical judgment method and system based on search word semantics
CN112506930B (en) Data insight system based on machine learning technology
Ivancsy et al. Clustering techniques utilized in web usage mining
Ouyang et al. critical factors for B2C e-commerce in China
CN116757198A (en) Text ordering method, equipment and storage medium based on text information similarity
CN116188102A (en) E-commerce platform management system based on Internet
CN113779379A (en) User portrait based house source pushing method, device and equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB03 Change of inventor or designer information

Inventor after: Cheng Guogen

Inventor after: Li Xinjie

Inventor before: Cheng Guogen

Inventor before: Li Xinran

CB03 Change of inventor or designer information
RJ01 Rejection of invention patent application after publication

Application publication date: 20190628

RJ01 Rejection of invention patent application after publication