CN105426425A - Big data marketing method based on mobile signaling - Google Patents

Big data marketing method based on mobile signaling Download PDF

Info

Publication number
CN105426425A
CN105426425A CN201510740047.6A CN201510740047A CN105426425A CN 105426425 A CN105426425 A CN 105426425A CN 201510740047 A CN201510740047 A CN 201510740047A CN 105426425 A CN105426425 A CN 105426425A
Authority
CN
China
Prior art keywords
data
model
application type
input data
application
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510740047.6A
Other languages
Chinese (zh)
Inventor
莫益军
秦思
王冼
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong University of Science and Technology
Original Assignee
Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and Technology filed Critical Huazhong University of Science and Technology
Priority to CN201510740047.6A priority Critical patent/CN105426425A/en
Publication of CN105426425A publication Critical patent/CN105426425A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2216/00Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
    • G06F2216/03Data mining

Abstract

The invention discloses a big data marketing method based on mobile signaling. The big data marketing method is characterized by comprising the following steps: building an application type model base and an algorithm base, wherein the application type model base comprises application models of different application types; sampling input data, conducting principal component analysis (PCA) and keyword matching on the sampled data to determine the application type of the input data, and determine the corresponding application model of the input data according to the application type; carrying out data screening according to the application type of the input data; fragmenting the screened data, and conducting distributed saving on the fragmented data; determining that the application model corresponding to the application type of the input data does not exist in the application type model library according to the principal component analysis result; and conducting corresponding data transformation on the input data according to the format requirement, on the input data, of an obtained combined sorting algorithm. The big data marketing method can provide an accurate processing model for big data, and is universal to different application scenarios.

Description

A kind of large data marketing method based on mobile signaling protocol
Technical field
The invention belongs to internet arena, more specifically, relate to a kind of large data marketing method based on mobile signaling protocol.
Background technology
Along with the fast development of infotech and the variation of people's obtaining information means, social all trades and professions all have a large amount of information datas.Having accumulated a large amount of raw data in the data warehouse of telecom operators must not utilize, in business process, there is the phenomenon that a large amount of customer churn and health service revenue glide in operator.Outside use existing business support system, can only be more see by related statements the result that these phenomenons occur, then take appropriate measures, the early warning that prior can not be had, to such an extent as to corresponding strategical reajustment can not be taked in time.In addition, the data processing speed of current business support system and response time are comparatively slow, therefore also cause administration and supervision authorities and decision-making level all can not obtain data result in time.
For this reason, operator launches to dispose to large data from strategic level, utilizes the powerful calculating ability of cloud computing, rapidly the large data message of process, and pay close attention to practical business, carrying out data collection and analysis excavation, is precision marketing yield-power by data transformations.At present, more existing enterprises and research institution propose some patent applications in large data processing field.
Such as, a kind of large data processing method based on PaaS platform is proposed, multiple Service server that wherein said system comprises PaaS platform server and upper structure thereof and the hadoop cluster be associated with each Service server in Chinese invention application CN201210571477.6.Described method is that user terminal sends data processing request to PaaS platform server; Data processing request described in PaaS platform server parses, sends assignment instructions to corresponding Service server; The hadoop cluster of described its correspondence of Service server calls, goes to perform the operation corresponding to described data processing request; Job result is returned to described Service server by described hadoop cluster; Job result is returned to PaaS platform server by described Service server; PaaS platform server responds to user terminal return service according to described job result.But the method is only to provide a kind of large Data distribution8 formula process and realizes system, but and the method for undeclared concrete large data processing.
Propose a kind of High-precision multi-dimensional counting Bloom Filter and large data processing method thereof in Chinese invention application CN201210590482.1, described method is the multidimensional property data set storing certain scale or have individual features in High-precision multi-dimensional counting Bloom Filter; Read and need multidimensional property large data sets to be processed; Carry out the process of High-precision multi-dimensional counting Bloom Filter, comprise Muhivitamin Formula With Minerals inquiry and renewal etc.; Export multidimensional property data set after treatment.But, the method is for multidimensional property data set, carry out the process of High-precision multi-dimensional counting Bloom Filter, the multidimensional property data set storing certain scale or there is individual features is counted in Bloom Filter, the input format requirement of pending multidimensional property large data sets demand fulfillment Bloom Filter in High-precision multi-dimensional; In addition, the method provide only the method for a kind of data value " purification ", does not form large data processing shelf system.
Summary of the invention
For above defect or the Improvement requirement of prior art, the invention provides a kind of large data marketing method based on mobile signaling protocol, its object is to, distributed storage technology can be utilized to improve mass data processing efficiency, simultaneously, the present invention can provide transaction module accurately for large data, has versatility to different application scenarioss.
For achieving the above object, according to one aspect of the present invention, provide a kind of large data marketing method based on mobile signaling protocol, comprise the following steps:
(1) set up application type model bank and algorithms library, wherein application type model bank comprises the application model of different application type;
(2) input data are sampled, principal component analysis (PCA) PCA and keyword match are carried out to the sample data after sampling, to determine the application type inputting data, and determine the application model of its correspondence according to this application type;
(3) data screening is carried out according to the application type of input data;
(4) data after the screening obtained step (3) carry out burst, and carry out Data distribution8 formula stores processor to the data after burst;
(5) draw in application type model bank there is not the application model corresponding with the application type of input data according to the principal component analysis result in step (2);
(6) according to the algorithm of the assembled classification algorithm of step (5) the gained call format to input data itself, corresponding data conversion is carried out to be met the input data of sorting algorithm input data format requirement to input data;
(7) model training is carried out to sample data, with the model after training, model enforcement is carried out to all input data;
(8) step (7) training model is out assessed, new model added application type model bank and upgrade application type model bank;
(9) call application model corresponding in application type model bank to the process of input data analysis, distributing data analysis result, and by result feedback to input end, form automatic closed loop model system.
Preferably, the starting stage, do not have application model in application type model bank, all application models are all added in application type model bank in the mode of increment; Comprise the applicable scene of different data classification algorithms and algorithm in algorithms library, according to the scene of different application types and data characteristics, satisfactory algorithm can be chosen adaptively.
Preferably, step (2) comprises following sub-step:
(2.1) adopt the progressive sampling of self-adaptation to sample to input data, tie up sample data X={x to obtain n 1, x 2..., x n, wherein x represents the data in input data in certain field, and n is natural number;
(2.2) sample data is tieed up to n and carry out principal component analysis (PCA), to find the key word in sample data;
(2.3) key word found in extraction step (2.2) mates with key word in model bank, and whether there is the application model corresponding to this key word in judgment models storehouse, if yes then enter step (9), otherwise enter step (3).
Preferably, step (2.2) is specially, and is first the covariance matrix S utilizing following formula (1) to calculate n dimension sample data X;
s i j = 1 / ( n - 1 ) [ Σ i , j = 1 n ( x i - x ′ ) ( x j - x ′ ) ] - - - ( 1 )
Wherein x ` = 1 / nΣ j = 1 n x j ,
Then the eigenwert of s-matrix is arranged according to order from big to small: λ 1>=λ 2>=...>=λ nif the ratio that wherein front m (wherein 1≤m≤n) individual eigenwert sum accounts for total characteristic value sum reaches more than 90%, then the field corresponding with a front m eigenwert in input data is selected to be key word.
Preferably, step (3) comprises following sub-step:
(3.1) according to the key word in the sample data found, from the raw data inputting the extracting data field corresponding with this key word;
(3.2) K mean cluster is carried out to the raw data extracted, and judge outlier and repeat number strong point according to cluster result, by outlier and repeating data point deletion, following process is carried out to the data of deleting after outlier and repeat number strong point: the value sample average of the data point of the data point inconsistent for value and value disappearance replaces;
(3.3) the cleaned data of a part are randomly drawed as detection data, chebyshev's theorem is utilized to detect all data cleaned in step (3.2), and judge that whether data error rate is lower than threshold value 0.5%, if data error rate is lower than 0.5%, then examination & verification is passed through, then enter step (4), otherwise repeat step (3.2);
Preferably, step (4) is specially, and first detects disk utilization, judges whether disk space meets the storage capacity requirement of the data after screening, if disk space meets the demands, then the data after screening is stored in designated disk; If disk space does not meet the demands, then burst is carried out, by fragment data distributed store at designated terminal to the data after screening.
Preferably, step (5) comprises following sub-step:
(5.1) in algorithms library, all sorting algorithms met the demands are selected according to the statistical property and field attribute that input data, and respectively sample data is classified by the sorting algorithm chosen, thus obtain different classification results, and calculate the accuracy rate of this classification results;
(5.2) selected sorting algorithm is encoded to string of binary characters; And using unary linear regression equation as fitness function f (y), for the accuracy rate of the classification results of interpretive classification algorithm;
(5.3) sorting algorithm y is calculated iselected probability is f (y i)/(f (y 1)+f (y 2)+...+f (y n));
(5.4) according to the binary coding of probability size to sorting algorithm that sorting algorithm is selected random carry out combined crosswise, or to the variation that the binary coding of sorting algorithm is carried out among a small circle, to produce classification results, and the process of above combined crosswise and variation constantly repeated down, until find out the near-optimization combination of assembled classification algorithm.
Preferably, step (6) comprises following sub-step:
(6.1) object is carried out to all data and focus on process, and carry out dimensionality reduction according to the covariance matrix S in step (2) to focusing on the data after processing;
(6.2) in proportion discretize is carried out to the continuous data after dimensionality reduction, and change of variable is carried out to the data after discrete, to meet the data format requirement of assembled classification algorithm;
(6.3) judge whether the data layout processing rear data meets the call format of assembled classification algorithm to input data, if do not meet the demands, then repeats step (6.1)-(6.2); If meet the demands, then enter step (7).
Preferably, step (7) comprises following sub-step:
(7.1) carry out initial parameter configuration to assembled classification algorithm model, initial parameter Pi is set to 1/m, and wherein, m is the sorting algorithm number chosen;
(7.2) model training is carried out to sample data, and model training acquired results is analyzed, if error in classification rate is less than 0.5%, then stops model training process, enter step (7.3); Otherwise, continue model training, repeat step (7.2);
(7.3) carry out model enforcement according to training pattern to all input data, carry out sampling analysis simultaneously to data processed result, if the error rate of analysis result is greater than threshold value 0.5%, then duplication model training, enters step (7.2); Otherwise, enter step (8).
Preferably, step (8) comprises following sub-step:
(8.1) following error in classification rate formula is used to assess model as score function:
S v = 1 / N Σ i = 1 N I ( f ( x i , θ ) , y ( i ) )
Wherein, the prediction that f (a (i), θ) makes individual i for model operation parameter value θ, 1≤i≤n, b (i) is the actual observed value of i-th entity in training data set, and when c is not equal to d, I (c, d)=1, otherwise, be 0;
(8.2) S is judged v(θ) whether exceed predetermined threshold, if be no more than threshold values, this model added the parameter configuration of also Renewal model in application type model bank, and enter step (9), otherwise, repeat step (5)-step (7).
In general, the above technical scheme conceived by the present invention compared with prior art, following beneficial effect can be obtained: the present invention is by maintenance model bank and an algorithms library, to the algorithm that different marketing scenes is found corresponding model and adapted with it, different application types can be adapted to; In addition, introduce distributed approach, distributed storage and process are carried out to data, alleviate the pressure of data Storage and Processing, the abundant information of large data can be processed rapidly.
Accompanying drawing explanation
Fig. 1 is the process flow diagram of the large data marketing method that the present invention is based on mobile signaling protocol.
Embodiment
In order to make object of the present invention, technical scheme and advantage clearly understand, below in conjunction with drawings and Examples, the present invention is further elaborated.Should be appreciated that specific embodiment described herein only in order to explain the present invention, be not intended to limit the present invention.In addition, if below in described each embodiment of the present invention involved technical characteristic do not form conflict each other and just can mutually combine.
As shown in Figure 1, the large data marketing method that the present invention is based on mobile signaling protocol comprises the following steps:
Step 1: set up application type model bank and algorithms library, wherein application type model bank comprises the application model of different application type; Specifically, the starting stage, do not have application model in application type model bank, all application models are all added in application type model bank in the mode of increment; Comprise the applicable scene of different data classification algorithms and algorithm in algorithms library, according to the scene of different application types and data characteristics, satisfactory algorithm can be chosen adaptively.
Step 2: input data are sampled, principal component analysis (PCA) (PrincipalComponentAnalysis is carried out to the sample data after sampling, PCA) and keyword match, to determine the application type inputting data, and the application model of its correspondence is determined according to this application type; This step comprises following sub-step further:
Step 2.1: adopt the progressive sampling of self-adaptation to sample to input data, ties up sample data X={x to obtain n 1, x 2..., x n, wherein x represents the data in input data in certain field, and n is natural number;
Step 2.2: sample data is tieed up to n and carries out principal component analysis (PCA), to find the key word in sample data;
Specifically, be first the covariance matrix S utilizing following formula (1) to calculate n dimension sample data X;
s i j = 1 / ( n - 1 ) [ Σ i , j = 1 n ( x i - x ′ ) ( x j - x ′ ) ] - - - ( 1 )
Wherein x ` = 1 / nΣ j = 1 n x j ,
Then the eigenwert of s-matrix is arranged according to order from big to small: λ 1>=λ 2>=...>=λ nif the ratio that wherein front m (wherein 1≤m≤n) individual eigenwert sum accounts for total characteristic value sum reaches more than 90%, then the field corresponding with a front m eigenwert in input data is selected to be key word;
Step 2.3: the key word found in extraction step 2.2 mates with key word in model bank, and whether there is the application model corresponding to this key word in judgment models storehouse, if yes then enter step (9), otherwise enter step (3);
Step 3: the application type according to input data carries out data screening; This step comprises following sub-step further:
Step 3.1: according to the key word in the sample data found, from the raw data inputting the extracting data field corresponding with this key word;
Step 3.2: K mean cluster is carried out to the raw data extracted, and judge outlier and repeat number strong point according to cluster result, by outlier and repeating data point deletion, following process is carried out to the data of deleting after outlier and repeat number strong point: the value sample average of the data point of the data point inconsistent for value and value disappearance replaces, and completes data cleansing by above step;
Step 3.3: randomly draw the cleaned data of a part as detection data, chebyshev's theorem is utilized to detect all data cleaned in step 3.2, and judge that whether data error rate is lower than threshold value 0.5%, if data error rate is lower than 0.5%, then examination & verification is passed through, then enter step (4), otherwise repeat step (3.2);
Specifically, the specific descriptions of chebyshev's theorem are: establish the mathematical expectation of stochastic variable X and variance all to exist, then to arbitrary constant ε > 0, have P (| X-E (X) |>=ε)≤D (X)/ε 2, or P (| X-E (X) | < ε)>=1-D (X)/ε 2; That the average of usage data, standard deviation and fiducial interval are to identify abnormal data to the detection of data.
Step 4: the data after the screening obtain step (3) carry out burst, and Data distribution8 formula stores processor is carried out to the data after burst, specifically, first disk utilization is detected, judge whether disk space meets the storage capacity requirement of the data after screening, if disk space meets the demands, then the data after screening are stored in designated disk; If disk space does not meet the demands, then burst is carried out, by fragment data distributed store at designated terminal to the data after screening.
Step 5: draw in application type model bank there is not the application model corresponding with the application type of input data according to the principal component analysis result in step (2); This step needs to utilize genetic algorithm in algorithms library, find out the combination of near-optimization sorting algorithm, and comprises following sub-step further:
Step 5.1: statistical property and field attribute according to inputting data select all sorting algorithms met the demands in algorithms library, and respectively sample data is classified by the sorting algorithm chosen, thus obtain different classification results, and calculate the accuracy rate of this classification results;
Step 5.2: selected sorting algorithm is encoded to string of binary characters, as 01011 And using unary linear regression equation as fitness function f (y), for the accuracy rate of the classification results of interpretive classification algorithm;
Step 5.3: calculate sorting algorithm y iselected probability is f (y i)/(f (y 1)+f (y 2)+...+f (y n)), wherein f (y i) be sorting algorithm y ifitness function;
Step 5.4: the binary coding of selected assembled classification algorithm and selected probability can be obtained by step (5.2) and (5.3), then according to the binary coding of probability size to sorting algorithm that sorting algorithm is selected random carry out combined crosswise, or to the variation that the binary coding of sorting algorithm is carried out among a small circle, obtain the array mode of new sorting algorithm by two kinds of modes above, produce classification results; And the process of above combined crosswise and variation constantly repeated down, until find out the near-optimization combination of assembled classification algorithm.
Step 6: according to the algorithm of the assembled classification algorithm of step (5) the gained call format to input data itself, carry out corresponding data conversion to be met the input data of sorting algorithm input data format requirement to input data, this step comprises further:
Step 6.1: object is carried out to all data and focuses on process, and carry out dimensionality reduction according to the covariance matrix S in step (2) to focusing on the data after processing;
Step 6.2: in proportion discretize is carried out to the continuous data after dimensionality reduction, and change of variable is carried out to the data after discrete, to meet the data format requirement of assembled classification algorithm;
Step 6.3: judge that after processing, whether the data layout of data meets the call format of assembled classification algorithm to input data, if do not meet the demands, then repeats step (6.1)-(6.2); If meet the demands, then enter step (7).
Step 7: first model training is carried out to sample data, then carry out model enforcement with the model after training to all input data, this step comprises further:
Step 7.1: carry out initial parameter configuration to assembled classification algorithm model, initial parameter Pi is set to 1/m, wherein, m is the sorting algorithm number chosen;
Step 7.2: model training is carried out to sample data, and model training acquired results is analyzed, if error in classification rate is less than 0.5%, then stops model training process, enter step (7.3); Otherwise, continue model training, repeat step (7.2);
Step 7.3: according to training pattern, model enforcement is carried out to all input data, sampling analysis is carried out to data processed result simultaneously, if the error rate of analysis result is greater than threshold value 0.5%, then duplication model training, enters step (7.2); Otherwise, enter step (8).
Step 8: assess step (7) training model out, new model added application type model bank and upgrade application type model bank, this step comprises further:
Step 8.1: use following error in classification rate formula (2) to assess model as score function:
S v = 1 / N &Sigma; i = 1 N I ( f ( x i , &theta; ) , y ( i ) ) - - - ( 2 )
Wherein, the prediction that f (a (i), θ) makes individual i for model operation parameter value θ, 1≤i≤n, b (i) is the actual observed value of i-th entity in training data set, and when c is not equal to d, I (c, d)=1, otherwise, be 0;
Step 8.2: judge S v(θ) (this threshold values is artificial setting whether to exceed predetermined threshold, depend on the circumstances, generally be set to 0.1), if be no more than threshold values, this model is added the parameter configuration of also Renewal model in application type model bank, and enter step (9), otherwise, repeat step (5)-step (7).
Step 9: call application model corresponding in application type model bank to the process of input data analysis, distributing data analysis result, and by result feedback to input end, form automatic closed loop model system.
Those skilled in the art will readily understand; the foregoing is only preferred embodiment of the present invention; not in order to limit the present invention, all any amendments done within the spirit and principles in the present invention, equivalent replacement and improvement etc., all should be included within protection scope of the present invention.

Claims (10)

1., based on a large data marketing method for mobile signaling protocol, it is characterized in that, comprise the following steps:
(1) set up application type model bank and algorithms library, wherein application type model bank comprises the application model of different application type;
(2) input data are sampled, principal component analysis (PCA) PCA and keyword match are carried out to the sample data after sampling, to determine the application type inputting data, and determine the application model of its correspondence according to this application type;
(3) data screening is carried out according to the application type of input data;
(4) data after the screening obtained step (3) carry out burst, and carry out Data distribution8 formula stores processor to the data after burst;
(5) draw in application type model bank there is not the application model corresponding with the application type of input data according to the principal component analysis result in step (2);
(6) according to the algorithm of the assembled classification algorithm of step (5) the gained call format to input data itself, corresponding data conversion is carried out to be met the input data of sorting algorithm input data format requirement to input data;
(7) model training is carried out to sample data, with the model after training, model enforcement is carried out to all input data;
(8) step (7) training model is out assessed, new model added application type model bank and upgrade application type model bank;
(9) call application model corresponding in application type model bank to the process of input data analysis, distributing data analysis result, and by result feedback to input end, form automatic closed loop model system.
2. large data marketing method according to claim 1, is characterized in that, the starting stage not having application model in application type model bank, and all application models are all added in application type model bank in the mode of increment; Comprise the applicable scene of different data classification algorithms and algorithm in algorithms library, according to the scene of different application types and data characteristics, satisfactory algorithm can be chosen adaptively.
3. large data marketing method according to claim 1, is characterized in that, step (2) comprises following sub-step:
(2.1) adopt the progressive sampling of self-adaptation to sample to input data, tie up sample data X={x to obtain n 1, x 2..., x n, wherein x represents the data in input data in certain field, and n is natural number;
(2.2) sample data is tieed up to n and carry out principal component analysis (PCA), to find the key word in sample data;
(2.3) key word found in extraction step (2.2) mates with key word in model bank, and whether there is the application model corresponding to this key word in judgment models storehouse, if yes then enter step (9), otherwise enter step (3).
4. large data marketing method according to claim 3, is characterized in that, step (2.2) is specially, and is first the covariance matrix S utilizing following formula (1) to calculate n dimension sample data X;
Wherein
Then the eigenwert of s-matrix is arranged according to order from big to small: λ 1>=λ 2>=...>=λ nif the ratio that wherein front m (wherein 1≤m≤n) individual eigenwert sum accounts for total characteristic value sum reaches more than 90%, then the field corresponding with a front m eigenwert in input data is selected to be key word.
5. large data marketing method according to claim 4, is characterized in that, step (3) comprises following sub-step:
(3.1) according to the key word in the sample data found, from the raw data inputting the extracting data field corresponding with this key word;
(3.2) K mean cluster is carried out to the raw data extracted, and judge outlier and repeat number strong point according to cluster result, by outlier and repeating data point deletion, following process is carried out to the data of deleting after outlier and repeat number strong point: the value sample average of the data point of the data point inconsistent for value and value disappearance replaces;
(3.3) the cleaned data of a part are randomly drawed as detection data, chebyshev's theorem is utilized to detect all data cleaned in step (3.2), and judge that whether data error rate is lower than threshold value 0.5%, if data error rate is lower than 0.5%, then examination & verification is passed through, then enter step (4), otherwise repeat step (3.2).
6. large data marketing method according to claim 1, it is characterized in that, step (4) is specially, first disk utilization is detected, judge whether disk space meets the storage capacity requirement of the data after screening, if disk space meets the demands, then the data after screening are stored in designated disk; If disk space does not meet the demands, then burst is carried out, by fragment data distributed store at designated terminal to the data after screening.
7. large data marketing method according to claim 1, is characterized in that, step (5) comprises following sub-step:
(5.1) in algorithms library, all sorting algorithms met the demands are selected according to the statistical property and field attribute that input data, and respectively sample data is classified by the sorting algorithm chosen, thus obtain different classification results, and calculate the accuracy rate of this classification results;
(5.2) selected sorting algorithm is encoded to string of binary characters; And using unary linear regression equation as fitness function f (y), for the accuracy rate of the classification results of interpretive classification algorithm;
(5.3) sorting algorithm y is calculated iselected probability is f (y i)/(f (y 1)+f (y 2)+...+f (y n));
(5.4) according to the binary coding of probability size to sorting algorithm that sorting algorithm is selected random carry out combined crosswise, or to the variation that the binary coding of sorting algorithm is carried out among a small circle, to produce classification results, and the process of above combined crosswise and variation constantly repeated down, until find out the near-optimization combination of assembled classification algorithm.
8. large data marketing method according to claim 1, is characterized in that, step (6) comprises following sub-step:
(6.1) object is carried out to all data and focus on process, and carry out dimensionality reduction according to the covariance matrix S in step (2) to focusing on the data after processing;
(6.2) in proportion discretize is carried out to the continuous data after dimensionality reduction, and change of variable is carried out to the data after discrete, to meet the data format requirement of assembled classification algorithm;
(6.3) judge whether the data layout processing rear data meets the call format of assembled classification algorithm to input data, if do not meet the demands, then repeats step (6.1)-(6.2); If meet the demands, then enter step (7).
9. large data marketing method according to claim 1, is characterized in that, step (7) comprises following sub-step:
(7.1) carry out initial parameter configuration to assembled classification algorithm model, initial parameter Pi is set to 1/m, and wherein, m is the sorting algorithm number chosen;
(7.2) model training is carried out to sample data, and model training acquired results is analyzed, if error in classification rate is less than 0.5%, then stops model training process, enter step (7.3); Otherwise, continue model training, repeat step (7.2);
(7.3) carry out model enforcement according to training pattern to all input data, carry out sampling analysis simultaneously to data processed result, if the error rate of analysis result is greater than threshold value 0.5%, then duplication model training, enters step (7.2); Otherwise, enter step (8).
10. large data marketing method according to claim 1, is characterized in that, step (8) comprises following sub-step:
(8.1) following error in classification rate formula is used to assess model as score function:
Wherein, the prediction that f (a (i), θ) makes individual i for model operation parameter value θ, 1≤i≤n, b (i) is the actual observed value of i-th entity in training data set, and when c is not equal to d, I (c, d)=1, otherwise, be 0;
(8.2) S is judged v(θ) whether exceed predetermined threshold, if be no more than threshold values, this model added the parameter configuration of also Renewal model in application type model bank, and enter step (9), otherwise, repeat step (5)-step (7).
CN201510740047.6A 2015-11-04 2015-11-04 Big data marketing method based on mobile signaling Pending CN105426425A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510740047.6A CN105426425A (en) 2015-11-04 2015-11-04 Big data marketing method based on mobile signaling

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510740047.6A CN105426425A (en) 2015-11-04 2015-11-04 Big data marketing method based on mobile signaling

Publications (1)

Publication Number Publication Date
CN105426425A true CN105426425A (en) 2016-03-23

Family

ID=55504637

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510740047.6A Pending CN105426425A (en) 2015-11-04 2015-11-04 Big data marketing method based on mobile signaling

Country Status (1)

Country Link
CN (1) CN105426425A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107092618A (en) * 2016-10-27 2017-08-25 北京小度信息科技有限公司 A kind of information processing method and device
CN111144677A (en) * 2018-11-06 2020-05-12 北京京东振世信息技术有限公司 Efficiency evaluation method and efficiency evaluation system
WO2022033115A1 (en) * 2020-08-12 2022-02-17 华为技术有限公司 Communication method and communication apparatus
CN114996318A (en) * 2022-07-12 2022-09-02 成都唐源电气股份有限公司 Automatic judgment method and system for processing mode of abnormal value of detection data
CN117278343A (en) * 2023-11-24 2023-12-22 戎行技术有限公司 Data multi-level output processing method based on big data platform data

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2493105A1 (en) * 2002-07-19 2004-01-29 British Telecommunications Public Limited Company Method and system for classification of semantic content of audio/video data
CN103020296A (en) * 2012-12-31 2013-04-03 湖南大学 High-precision multi-dimensional counting Bloom filter and large data processing method thereof
CN103067486A (en) * 2012-12-26 2013-04-24 广州杰赛科技股份有限公司 Big-data processing method based on platform-as-a-service (PaaS) platform
CN103886487A (en) * 2014-03-28 2014-06-25 焦点科技股份有限公司 Individualized recommendation method and system based on distributed B2B platform
CN105354198A (en) * 2014-08-19 2016-02-24 中国移动通信集团湖北有限公司 Data processing method and apparatus

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2493105A1 (en) * 2002-07-19 2004-01-29 British Telecommunications Public Limited Company Method and system for classification of semantic content of audio/video data
US20050238238A1 (en) * 2002-07-19 2005-10-27 Li-Qun Xu Method and system for classification of semantic content of audio/video data
CN103067486A (en) * 2012-12-26 2013-04-24 广州杰赛科技股份有限公司 Big-data processing method based on platform-as-a-service (PaaS) platform
CN103020296A (en) * 2012-12-31 2013-04-03 湖南大学 High-precision multi-dimensional counting Bloom filter and large data processing method thereof
CN103886487A (en) * 2014-03-28 2014-06-25 焦点科技股份有限公司 Individualized recommendation method and system based on distributed B2B platform
CN105354198A (en) * 2014-08-19 2016-02-24 中国移动通信集团湖北有限公司 Data processing method and apparatus

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
廖振松等: "《基于移动信令数据分析的大数据中间件研究》", 《信息通信》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107092618A (en) * 2016-10-27 2017-08-25 北京小度信息科技有限公司 A kind of information processing method and device
CN111144677A (en) * 2018-11-06 2020-05-12 北京京东振世信息技术有限公司 Efficiency evaluation method and efficiency evaluation system
CN111144677B (en) * 2018-11-06 2023-11-07 北京京东振世信息技术有限公司 Efficiency evaluation method and efficiency evaluation system
WO2022033115A1 (en) * 2020-08-12 2022-02-17 华为技术有限公司 Communication method and communication apparatus
US11855846B2 (en) 2020-08-12 2023-12-26 Huawei Technologies Co., Ltd. Communication method and communication apparatus
CN114996318A (en) * 2022-07-12 2022-09-02 成都唐源电气股份有限公司 Automatic judgment method and system for processing mode of abnormal value of detection data
CN117278343A (en) * 2023-11-24 2023-12-22 戎行技术有限公司 Data multi-level output processing method based on big data platform data
CN117278343B (en) * 2023-11-24 2024-02-02 戎行技术有限公司 Data multi-level output processing method based on big data platform data

Similar Documents

Publication Publication Date Title
KR102315497B1 (en) Method and device for building a scoring model and evaluating user credit
CN105426425A (en) Big data marketing method based on mobile signaling
CN111597348B (en) User image drawing method, device, computer equipment and storage medium
CN105354210A (en) Mobile game payment account behavior data processing method and apparatus
CN105354198A (en) Data processing method and apparatus
CN104598557A (en) Method and device for data rasterization and method and device for user behavior analysis
CN109635010B (en) User characteristic and characteristic factor extraction and query method and system
CN104573130A (en) Entity resolution method based on group calculation and entity resolution device based on group calculation
CN106651232B (en) Freight note number data analysis method and device
CN110647995A (en) Rule training method, device, equipment and storage medium
CN110689427A (en) Consumption stage default probability model based on survival analysis
US11841839B1 (en) Preprocessing and imputing method for structural data
CN111476296A (en) Sample generation method, classification model training method, identification method and corresponding devices
CN109783805A (en) A kind of network community user recognition methods and device
CN111090780A (en) Method and device for determining suspicious transaction information, storage medium and electronic equipment
CN107451249B (en) Event development trend prediction method and device
CN114647684A (en) Traffic prediction method and device based on stacking algorithm and related equipment
CN106776757B (en) Method and device for indicating user to complete online banking operation
CN107330709B (en) Method and device for determining target object
CN105991574A (en) Risk behavior monitoring method and apparatus thereof
CN116610821B (en) Knowledge graph-based enterprise risk analysis method, system and storage medium
CN109145109B (en) User group message propagation abnormity analysis method and device based on social network
CN108076032B (en) Abnormal behavior user identification method and device
CN106874286B (en) Method and device for screening user characteristics
CN112801784A (en) Bit currency address mining method and device for digital currency exchange

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20160323