CN105824811B - A kind of big data analysis method and device thereof - Google Patents
A kind of big data analysis method and device thereof Download PDFInfo
- Publication number
- CN105824811B CN105824811B CN201510001942.6A CN201510001942A CN105824811B CN 105824811 B CN105824811 B CN 105824811B CN 201510001942 A CN201510001942 A CN 201510001942A CN 105824811 B CN105824811 B CN 105824811B
- Authority
- CN
- China
- Prior art keywords
- data
- rule
- group
- target data
- suspected target
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Landscapes
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The embodiment of the invention discloses a kind of big data analysis methods, comprising: first group of data and second group of data based on input obtain at least two characteristic informations for meeting preset condition;First group of data and second group of data are the data in the first communication network;First group of data meet the first preset rules;Second group of data meet the second preset rules;According at least two characteristic information, first group of data and second group of data are analyzed, determine the first rule-like and the second rule-like;According to first rule-like and the second rule-like, the target data for meeting first preset rules is determined in the third group data of input;The third group data are the data in other communication networks in addition to first communication network.The embodiment of the invention also discloses a kind of big data analysis devices.
Description
Technical field
The present invention relates to the communication technology more particularly to a kind of big data analysis method and device thereof.
Background technique
With fourth generation mobile communication technology (4 Generation mobile communication of 4G, the
Technology commercialization), major operator's competition are fierce increasingly;The infiltration of the reflux work and 4G terminal of rete mirabile high-value user
Work plays an important role for the development of mobile operator thoroughly;So the identification of rete mirabile high-value user seems most important.
Industry has the method analyzed user behavior and modeled to determine user property at present, still, existing side
In method, the quantity of statistics rete mirabile user is generally laid particular emphasis on, the identification of rete mirabile user and the terminal of rete mirabile user are not laid particular emphasis on
The identification of type.
Summary of the invention
To solve existing technical problem, the embodiment of the invention provides a kind of big data analysis method and its dresses
It sets, can determine the target data for meeting preset rules in rete mirabile data according to Home Network data rule.
The technical solution of the embodiment of the present invention is achieved in that the embodiment of the invention provides a kind of big data analysis sides
Method, which comprises
First group of data and second group of data based on input obtain at least two characteristic informations for meeting preset condition;
First group of data and second group of data are the data in the first communication network;It is default that first group of data meet first
Rule;Second group of data meet the second preset rules;
According at least two characteristic information, first group of data and second group of data are analyzed, are determined
First rule-like and the second rule-like;
According to first rule-like and the second rule-like, determine to meet described first in the third group data of input
The target data of preset rules;The third group data are the number in other communication networks in addition to first communication network
According to.
It is described according at least two characteristic information in above scheme, to first group of data and second group of data
It is analyzed, determines the first rule-like and the second rule-like, comprising:
Using logistic regression algorithm, according at least two characteristic information, to first group of data and described second
Group data are analyzed, and determine the first rule-like;
Using decision Tree algorithms, according at least two characteristic information, to first group of data and second group described
Data are analyzed, and determine the second rule-like.
It is described to use decision Tree algorithms in above scheme, according at least two characteristic information, to first group of number
It is analyzed according to second group of data, determines the second rule-like, comprising:
Using decision Tree algorithms, according at least two characteristic information, to first group of data and second group described
Data are analyzed, and determine N number of rule;The N is the positive integer more than or equal to 2;
In N number of rule, the second rule-like for meeting third preset rules is determined.
It is described according to first rule-like and the second rule-like in above scheme, in the third group data of input really
Make the target data for meeting first preset rules, comprising:
Respectively according to first rule-like and the second rule-like, the third group data of input are analyzed, obtain
One suspected target data and the second suspected target data;
It determines to meet first preset rules based on the first suspected target data and the second suspected target data
Target data.
In above scheme, second rule-like includes: first kind sub-rule;The first kind sub-rule meets described
One preset rules;
Accordingly, described respectively according to first rule-like and the second rule-like, the third group data of input are carried out
Analysis, obtains the first suspected target data and the second suspected target data, comprising:
According to first rule-like, the third group data of input are analyzed, the first suspected target data are obtained;
According to the first kind sub-rule, the third group data of input are analyzed, the second suspected target data are obtained.
In above scheme, second rule-like further include: the second class sub-rule;The second class sub-rule meets second
Preset rules;The method also includes:
According to the second class sub-rule, the first suspected target data and the second suspected target data are carried out
Analysis, obtains doubtful non-targeted data;
It is accordingly, described to determine target data based on the first suspected target data and the second suspected target data,
Include:
Based on the first suspected target data, the second suspected target data and doubtful non-targeted data, target is determined
Data.
The embodiment of the invention also provides a kind of big data analysis device, described device includes:
Acquiring unit, for first group of data and second group of data based on input, acquisition meets preset condition at least
Two characteristic informations;First group of data and second group of data are the data in the first communication network;First group of number
According to meeting the first preset rules;Second group of data meet the second preset rules;
Analytical unit, for according at least two characteristic information, to first group of data and second group of data into
Row analysis, determines the first rule-like and the second rule-like;
Determination unit, for being determined in the third group data of input according to first rule-like and the second rule-like
Meet the target data of first preset rules out;The third group data are other in addition to first communication network
Data in communication network.
In above scheme, the analytical unit includes:
First analysis subelement, for using logistic regression algorithm, according at least two characteristic information, to described the
One group of data and second group of data are analyzed, and determine the first rule-like;
Second analysis subelement, for using decision Tree algorithms, according at least two characteristic information, to described first
Group data and second group of data are analyzed, and determine the second rule-like.
In above scheme, the second analysis subelement is also used to using decision Tree algorithms, special according to described at least two
Reference breath, analyzes first group of data and second group of data, determines N number of rule;The N be more than or equal to
2 positive integer;
It is also used in N number of rule, determines the second rule-like for meeting third preset rules.
In above scheme, the determination unit, comprising:
First determines subelement, is used for respectively according to first rule-like and the second rule-like, to the third group of input
Data are analyzed, and the first suspected target data and the second suspected target data are obtained;
Second determines subelement, full for being determined based on the first suspected target data and the second suspected target data
The target data of foot first preset rules.
In above scheme, second rule-like includes: first kind sub-rule;The first kind sub-rule meets described
One preset rules;Accordingly,
Described first determines subelement, is also used to divide the third group data of input according to first rule-like
Analysis, obtains the first suspected target data;
It is also used to analyze the third group data of input according to the first kind sub-rule, obtain the second doubtful mesh
Mark data.
In above scheme, second rule-like further include: the second class sub-rule;The second class sub-rule meets second
Preset rules;
Described first determines subelement, is also used to according to the second class sub-rule, to the first suspected target data
It is analyzed with the second suspected target data, obtains doubtful non-targeted data;
Accordingly, it described second determines subelement, is also used to based on the first suspected target data, the second suspected target
Data and doubtful non-targeted data, determine target data.
Big data analysis method and device thereof provided by the embodiment of the present invention, can be at first group of the first communication network
At least two characteristic informations are determined in data and second group of data, and use two kinds of algorithms of different, are based on described at least two
Characteristic information determines the first rule-like for being directed to algorithms of different and the second rule-like, in this way, passing through first rule-like
With the second rule-like, the third group data in other communication networks in addition to first communication network are analyzed, with
The target data for meeting preset rules is determined in the third group data, and therefore, the embodiment of the present invention can be realized foundation
Home Network data rule determines the purpose for the target data for meeting preset rules in rete mirabile data.
Detailed description of the invention
Fig. 1 is the implementation process schematic diagram of big data analysis of embodiment of the present invention method;
Fig. 2 is the concrete structure schematic diagram of big data analysis of embodiment of the present invention device;
Fig. 3 is the concrete structure schematic diagram of analytical unit of the embodiment of the present invention;
Fig. 4 is the concrete structure schematic diagram of determination unit of the embodiment of the present invention;
Fig. 5 is the flow diagram of the specific implementation of big data analysis of embodiment of the present invention method.
Specific embodiment
In order to more fully hereinafter understand the features of the present invention and technology contents, with reference to the accompanying drawing to reality of the invention
It is now described in detail, appended attached drawing purposes of discussion only for reference, is not used to limit the present invention.
Embodiment one
Fig. 1 is the implementation process schematic diagram of big data analysis of embodiment of the present invention method;As shown in Figure 1, the method packet
It includes:
Step 101: first group of data and second group of data based on input obtain at least two spies for meeting preset condition
Reference breath;First group of data and second group of data are the data in the first communication network;First group of data meet
First preset rules;Second group of data meet the second preset rules;
In the present embodiment, first preset rules can be in the communication of the corresponding user of the first data in communication network
Device type belongs to the rule of the first kind;Second preset rules can be in the corresponding use of the first data in communication network
The communication device types at family are not belonging to the rule of the first kind;In this way, in first communication network, first group of data
Corresponding communication device types are the first kind;The corresponding communication device types of second group of data are not the first kind
Type;The characterization rules of the data as corresponding to different communication device type are different, by first group of data and second
The group respective characterization rules of data are analyzed, and are capable of determining that M characteristic information for meeting preset condition;Based on the M
Characteristic information analyzes data, can estimate the features such as the corresponding communication device types of data;Based on the above process, originally
Inventive embodiments can determine that communication is set according to the characteristic information in first communication network from the mass data of rete mirabile
Standby type belongs to the data of the first kind, lays the foundation for big data analysis;Here, the M is the positive integer more than or equal to 2.
In the present embodiment, the characteristic information is specially the key variables index for meeting preset condition, using different calculations
Method analyzes the big data in the first communication network by key variables index, namely to first group of data and second group
Data are analyzed, in this way, to determine that rule lays the foundation in the big data of the first communication network.
In the present embodiment, the preset condition includes but is not limited to: greater than condition equal to the first number of users, communication pair
The communication device types of elephant are the condition etc. of the first kind.
Step 102: according at least two characteristic information, first group of data and second group of data being divided
Analysis, determines the first rule-like and the second rule-like;
It is right using algorithms of different according at least two characteristic informations determined in the first communication network in the present embodiment
First group of data and second group of data are analyzed, and then determine the first rule-like based on first communication network
With the second rule-like.
In practical applications, when carrying out data analysis to big data, different algorithms is usually selected, in this way, dividing to improve
Analyse the accuracy of result;Therefore, the present embodiment also selects two different algorithms to the first group of data and second group of number of input
According to being analyzed.
It is described according at least two characteristic information in above scheme, to first group of data and second group of data
It is analyzed, determines the first rule-like and the second rule-like, comprising:
Using logistic regression algorithm, according at least two characteristic information, to first group of data and described second
Group data are analyzed, and determine the first rule-like;
Using decision Tree algorithms, according at least two characteristic information, to first group of data and second group described
Data are analyzed, and determine the second rule-like.
It is described to use decision Tree algorithms in above scheme, according at least two characteristic information, to first group of number
It is analyzed according to second group of data, determines the second rule-like, comprising:
Using decision Tree algorithms, according at least two characteristic information, to first group of data and second group described
Data are analyzed, and determine N number of rule;The N is the positive integer more than or equal to 2;
In N number of rule, the second rule-like for meeting third preset rules is determined.
In the present embodiment, since the number for the characteristic information determined in step 101 is different, so that using decision Tree algorithms
The number for the rule determined is different, i.e. N is different;Therefore, the value of N is limited to the number of the characteristic information.
In the present embodiment, second rule-like is one to be referred to as, and is in N number of rule, all meets the default rule of third
Rule then is referred to as, and therefore, does not refer to an ad hoc rules.
Step 103: according to first rule-like and the second rule-like, determining to meet in the third group data of input
The target data of first preset rules;The third group data are other communication networks in addition to first communication network
Data in network.
In the present embodiment, can by the first rule-like and the second rule-like determined in the first communication network,
In the mass data in other communication networks in addition to first communication network, determine the mesh for meeting the first preset rules
Mark data determine that the communication device types of user belong to the number of targets of the first kind that is, in the data of other communication networks
According in this way, realization determines the mesh for the target data for meeting preset rules based on data rule in Home Network in rete mirabile data
's.
It is described according to first rule-like and the second rule-like in above scheme, in the third group data of input really
Make the target data for meeting first preset rules, comprising:
Respectively according to first rule-like and the second rule-like, the third group data of input are analyzed, obtain
One suspected target data and the second suspected target data;
It determines to meet first preset rules based on the first suspected target data and the second suspected target data
Target data.
In the present embodiment, the first suspected target data are data corresponding with the first rule-like, that is, pass through the first kind
Rule, the doubtful mesh for the first preset rules of satisfaction determined in other communication networks in addition to first communication network
Mark data;The second suspected target data are data corresponding with the second rule-like, i.e., by the second rule-like, except described
The suspected target data for the first preset rules of satisfaction determined in other communication networks except first communication network.
In above scheme, second rule-like includes: first kind sub-rule;The first kind sub-rule meets described
One preset rules;
Accordingly, described respectively according to first rule-like and the second rule-like, the third group data of input are carried out
Analysis, obtains the first suspected target data and the second suspected target data, comprising:
According to first rule-like, the third group data of input are analyzed, the first suspected target data are obtained;
According to the first kind sub-rule, the third group data of input are analyzed, the second suspected target data are obtained.
In the present embodiment, since second rule-like is the rule determined using decision Tree algorithms, pass through
Two rule-likes are capable of determining that the second suspected target data for meeting the first preset rules, and meet the doubtful of the second preset rules
Non-targeted data;That is, second rule-like includes: first kind sub-rule and the second class sub-rule;Pass through the first kind
Rule is capable of determining that the second suspected target data for meeting the first preset rules;It, can be true by the second class sub-rule
Make the doubtful non-targeted data for meeting the second preset rules;Therefore, the present embodiment also need from the first suspected target data and
Doubtful non-targeted data are rejected in second suspected target data, to determine final goal data.
In the present embodiment, the first kind sub-rule is the rule for meeting the first preset rules;The second class sub-rule
For the rule for being unsatisfactory for first preset rules;It also is the rule for meeting second preset rules;When second class
Sub-rule is when being unsatisfactory for the rule of first preset rules, and the doubtful non-targeted data are a kind of interference data;Therefore,
The doubtful non-targeted data are referred to as interference data.
In above scheme, second rule-like further include: the second class sub-rule;The second class sub-rule meets second
Preset rules;The method also includes:
According to the second class sub-rule, the first suspected target data and the second suspected target data are carried out
Analysis, obtains doubtful non-targeted data;
It is accordingly, described to determine target data based on the first suspected target data and the second suspected target data,
Include:
Based on the first suspected target data, the second suspected target data and doubtful non-targeted data, target is determined
Data.
To realize the above method, the embodiment of the invention also provides a kind of big data analysis devices, as shown in Fig. 2, described
Device includes:
Acquiring unit 21, for first group of data and second group of data based on input, acquisition meets preset condition extremely
Few two characteristic informations;First group of data and second group of data are the data in the first communication network;Described first group
Data meet the first preset rules;Second group of data meet the second preset rules;
Analytical unit 22 is used for according at least two characteristic information, to first group of data and second group of data
It is analyzed, determines the first rule-like and the second rule-like;
Determination unit 23 is used for according to first rule-like and the second rule-like, in the third group data of input really
Make the target data for meeting first preset rules;The third group data are its in addition to first communication network
Data in his communication network.
In above scheme, as shown in figure 3, the analytical unit 22 includes:
First analysis subelement 221, for using logistic regression algorithm, according at least two characteristic information, to institute
It states first group of data and second group of data is analyzed, determine the first rule-like;
Second analysis subelement 222, for using decision Tree algorithms, according at least two characteristic information, to described
First group of data and second group of data are analyzed, and determine the second rule-like.
In above scheme, the second analysis subelement 222 is also used to using decision Tree algorithms, according to described at least two
A characteristic information analyzes first group of data and second group of data, determines N number of rule;The N be greater than
Positive integer equal to 2;
It is also used in N number of rule, determines the second rule-like for meeting third preset rules.
In above scheme, as shown in figure 4, the determination unit 23, comprising:
First determines subelement 231, is used for respectively according to first rule-like and the second rule-like, to the third of input
Group data are analyzed, and the first suspected target data and the second suspected target data are obtained;
Second determines subelement 232, for being determined based on the first suspected target data and the second suspected target data
Meet the target data of first preset rules out.
In above scheme, second rule-like includes: first kind sub-rule;The first kind sub-rule meets described
One preset rules;Accordingly,
Described first determines subelement 231, is also used to carry out the third group data of input according to first rule-like
Analysis, obtains the first suspected target data;
It is also used to analyze the third group data of input according to the first kind sub-rule, obtain the second doubtful mesh
Mark data.
In above scheme, second rule-like further include: the second class sub-rule;The second class sub-rule meets second
Preset rules;
Described first determines subelement 231, is also used to according to the second class sub-rule, the first suspected target data
It is analyzed with the second suspected target data, obtains doubtful non-targeted data;
Accordingly, it described second determines subelement 232, is also used to based on the first suspected target data, second doubtful
Target data and doubtful non-targeted data, determine target data.
The acquiring unit 21, analytical unit 22 and determination unit 23 can be run on computer, can be by being located at meter
Central processing unit (CPU) or microprocessor (MPU) or digital signal processor (DSP) or programmable gate array on calculation machine
(FPGA) it realizes.
Embodiment two
First software, such as IMESSAGE software refer to the software that short message is sent between the user of first kind terminal built-in,
The software can be such that short message directly sends from the end GPRS, save the short-message fee of the user using first kind terminal;Therefore,
The usage amount that short message may be greatly reduced using the first kind terminal user of the first software forms short message black hole phenomenon,
The present embodiment is based on said short message black hole phenomenon, determines that terminal type is the user of the first kind in rete mirabile.
The present embodiment mainly utilizes the existing communication data through subsystem, and analysis Home Network uses the first kind of the first software
The characteristics of crowd of the communication behavior of terminal user and its relationship cycle, identifies that rete mirabile has above-mentioned communication behavior, Yi Jiqi
Relationship cycle crowd meets data namely the user of These characteristics, finally to determine terminal type as the first kind in rete mirabile
User, with power-assisted in the reflux work and marketing strategy of the rete mirabile high value customer of operator.
Specifically, the present embodiment is mainly based on user's relationship cycle model, by analysis Home Network first kind terminal
It is accustomed to feature using the customer voice relationship cycle of the first software and short message relationship cycle etc., in rete mirabile a large number of users, analyzes the
The user group of one type terminal user, so analyze a certain user of rete mirabile whether be first kind terminal user probability, with
The data information with reference value is provided for operator.
Fig. 5 is the flow diagram of the specific implementation of big data analysis of embodiment of the present invention method;Carrying out big data point
Before analysis, it is thus necessary to determine that go out first group of data and second group of data;Specifically, determine there is first in the first communication network
First group of data of data volume and second group of data with the first data volume;Wherein, each data in first group of data
Corresponding user device type is the first kind;The corresponding user device type of second group of data is the non-first kind;Such as
Shown in Fig. 5, which comprises
Step 501: in first group of data and second group of data, respectively being corresponded in conjunction with first group of data and second group of data
The characterization rules of relationship cycle of user, whether voice and the characterization rules of short message, contact use the first kind in relationship cycle
The characterization rules etc. of type terminal select M characteristic information;Wherein, M is the positive integer more than or equal to 2;
Here, the characteristic information is also referred to as key variables index.
Step 502: logistic regression algorithm is used, according to the M characteristic information, to first group of data and second group
Data are analyzed, and the first rule-like for meeting the first preset rules is simulated;
Here, first rule-like can be logistic regression formula;First preset rules are type of user terminal
For the rule of the first kind.
It is described that first group of data and second group of data are analyzed in the present embodiment, it is pre- to simulate satisfaction first
If the first rule-like of rule, comprising:
First group of data and second group of data are carried out using logistic regression algorithm based on the M characteristic information
Analysis simulates the first rule-like for meeting the first preset rules.
Step 503: determining third group data, according to first rule-like, calculate each number in the third group data
According to probability, to determine the first suspected target data;The third group data be and the user in first communication network
Data corresponding to users being communicated, in other communication networks;
Here, described according to first rule-like, the probability of each data in the third group data is calculated, with determination
First suspected target data out further comprise:
According to first rule-like, the probability of each data in the third group data is calculated;
According to data traffic requirement, the corresponding pre-set user number of the logistic regression grade of logistic regression algorithm, described the
In the corresponding probability of each data in three groups of data, determine that probability is more than or equal to the data of preset threshold, and probability is greater than
Equal to preset threshold data as the first suspected target data.
Step 504: C5 decision Tree algorithms are used, according to the M characteristic information, to first group of data and described the
Two groups of data are analyzed, and determine m1 rule A and m2 rule B;
Step 505: according to regular A number of users corresponding with rule B and confidence level, regular A and rule B are screened,
To determine first kind sub-rule in the rule A, the second class sub-rule is determined in the rule B;
Here, the first kind sub-rule meets first preset rules;The second class sub-rule meets described
Two preset rules;Described m1, m2 are the positive integer more than or equal to 1.
Specifically, when the number of users of first group of data and second group of data is 10W, confidence is filtered out from regular A
Degree is greater than the rule that 85%, number of users is greater than 2W, is determined as first kind sub-rule;Confidence level is filtered out from regular B to be greater than
90%, number of users is greater than the rule of 1.8W, is determined as the second class sub-rule;
In the present embodiment, the first kind sub-rule and the second class sub-rule belong to the second rule-like.
Step 506: according to the first kind sub-rule, the third group data being analyzed, determine that second is doubtful
Target data;
Step 507: the intersection data of the first suspected target data and the second suspected target data are determined, as third
Suspected target data;
Step 508: rejecting the data for meeting the second class sub-rule in the third suspected target data, remaining third is doubted
Like target data as target data.
The embodiment of the present invention can determine key in the first group of data and second group of data in the first communication network
Variable index, i.e. characteristic information;And logistic regression algorithm and decision Tree algorithms are respectively adopted to first group of data and second
Group data are analyzed, and determine the first rule-like corresponding with the logistic regression algorithm, and with the decision Tree algorithms pair
The second rule-like answered;Wherein, second rule-like includes first kind sub-rule and the second class sub-rule;Then, respectively according to
The third group data in rete mirabile are analyzed according to first rule-like and first kind sub-rule, determine the first suspected target
Data and the second suspected target data;Since first rule-like meets the first preset rules;The first kind sub-rule
Meet first preset rules;And the second class sub-rule meets second preset rules, therefore, takes the described first doubtful mesh
After third suspected target data are determined in the intersection of mark data and the second suspected target data, in the third suspected target data
It is middle to reject the data for meeting the second class sub-rule, i.e., doubtful non-targeted data are rejected in the third suspected target data with most
Target data is obtained eventually, and the target data is to determine to meet first in rete mirabile data in advance according to Home Network data rule
If the target data of rule.
It should be understood by those skilled in the art that, the embodiment of the present invention can provide as method, system or computer program
Product.Therefore, the shape of hardware embodiment, software implementation or embodiment combining software and hardware aspects can be used in the present invention
Formula.Moreover, the present invention, which can be used, can use storage in the computer that one or more wherein includes computer usable program code
The form for the computer program product implemented on medium (including but not limited to magnetic disk storage and optical memory etc.).
The present invention be referring to according to the method for the embodiment of the present invention, the process of equipment (system) and computer program product
Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions
The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs
Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce
A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real
The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy
Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates,
Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or
The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting
Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or
The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one
The step of function of being specified in a box or multiple boxes.
The above is only the embodiment of the embodiment of the present invention, it is noted that for the ordinary skill of the art
For personnel, without departing from the principles of the embodiments of the present invention, can also make several improvements and retouch, these improve and
Retouching also should be regarded as the protection scope of the embodiment of the present invention.
Claims (12)
1. a kind of big data analysis method, which is characterized in that the described method includes:
First group of data and second group of data based on input obtain at least two characteristic informations for meeting preset condition;It is described
First group of data and second group of data are the data in the first communication network;First group of data meet the first default rule
Then;Second group of data meet the second preset rules;
According at least two characteristic information, first group of data and second group of data are analyzed, determine first
Rule-like and the second rule-like;
According to first rule-like and the second rule-like, determine that meeting described first presets in the third group data of input
The target data of rule;The third group data are the data in other communication networks in addition to first communication network.
2. the method according to claim 1, wherein described according at least two characteristic information, to described
First group of data and second group of data are analyzed, and determine the first rule-like and the second rule-like, comprising:
Using logistic regression algorithm, according at least two characteristic information, to first group of data and second group of number
According to being analyzed, the first rule-like is determined;
Using decision Tree algorithms, according at least two characteristic information, to first group of data and second group of data
It is analyzed, determines the second rule-like.
3. according to the method described in claim 2, it is characterized in that, described use decision Tree algorithms, according to described at least two
Characteristic information analyzes first group of data and second group of data, determines the second rule-like, comprising:
Using decision Tree algorithms, according at least two characteristic information, to first group of data and second group of data
It is analyzed, determines N number of rule;The N is the positive integer more than or equal to 2;
In N number of rule, the second rule-like for meeting third preset rules is determined.
4. according to claim 1 or 3 described in any item methods, which is characterized in that described according to first rule-like and
Two rule-likes determine the target data for meeting first preset rules in the third group data of input, comprising:
Respectively according to first rule-like and the second rule-like, the third group data of input are analyzed, first is obtained and doubts
Like target data and the second suspected target data;
The mesh for meeting first preset rules is determined based on the first suspected target data and the second suspected target data
Mark data.
5. according to the method described in claim 4, it is characterized in that, second rule-like includes: first kind sub-rule;It is described
First kind sub-rule meets first preset rules;
It is accordingly, described that the third group data of input are analyzed respectively according to first rule-like and the second rule-like,
Obtain the first suspected target data and the second suspected target data, comprising:
According to first rule-like, the third group data of input are analyzed, the first suspected target data are obtained;
According to the first kind sub-rule, the third group data of input are analyzed, the second suspected target data are obtained.
6. according to the method described in claim 5, it is characterized in that, second rule-like further include: the second class sub-rule;Institute
It states the second class sub-rule and meets the second preset rules;The method also includes:
According to the second class sub-rule, the first suspected target data and the second suspected target data are divided
Analysis, obtains doubtful non-targeted data;
It is accordingly, described to determine target data based on the first suspected target data and the second suspected target data, comprising:
Based on the first suspected target data, the second suspected target data and doubtful non-targeted data, target data is determined.
7. a kind of big data analysis device, which is characterized in that described device includes:
Acquiring unit, for first group of data and second group of data based on input, acquisition meets at least two of preset condition
Characteristic information;First group of data and second group of data are the data in the first communication network;First group of data are full
The first preset rules of foot;Second group of data meet the second preset rules;
Analytical unit, for dividing first group of data and second group of data according at least two characteristic information
Analysis, determines the first rule-like and the second rule-like;
Determination unit, it is full for being determined in the third group data of input according to first rule-like and the second rule-like
The target data of foot first preset rules;The third group data are other communications in addition to first communication network
Data in network.
8. device according to claim 7, which is characterized in that the analytical unit includes:
First analysis subelement, for using logistic regression algorithm, according at least two characteristic information, to described first group
Data and second group of data are analyzed, and determine the first rule-like;
Second analysis subelement, for using decision Tree algorithms, according at least two characteristic information, to first group of number
It is analyzed according to second group of data, determines the second rule-like.
9. device according to claim 8, which is characterized in that the second analysis subelement is also used to using decision tree
Algorithm is analyzed first group of data and second group of data, is determined according at least two characteristic information
N number of rule;The N is the positive integer more than or equal to 2;
It is also used in N number of rule, determines the second rule-like for meeting third preset rules.
10. device according to any one of claims 7 to 9, which is characterized in that the determination unit, comprising:
First determines subelement, is used for respectively according to first rule-like and the second rule-like, to the third group data of input
It is analyzed, obtains the first suspected target data and the second suspected target data;
Second determines subelement, for determining to meet institute based on the first suspected target data and the second suspected target data
State the target data of the first preset rules.
11. device according to claim 10, which is characterized in that second rule-like includes: first kind sub-rule;Institute
It states first kind sub-rule and meets first preset rules;Accordingly,
Described first determines subelement, is also used to analyze the third group data of input according to first rule-like, obtain
To the first suspected target data;
It is also used to analyze the third group data of input according to the first kind sub-rule, obtain the second suspected target number
According to.
12. device according to claim 11, which is characterized in that second rule-like further include: the second class sub-rule;
The second class sub-rule meets the second preset rules;
Described first determines subelement, is also used to according to the second class sub-rule, to the first suspected target data and institute
It states the second suspected target data to be analyzed, obtains doubtful non-targeted data;
Accordingly, it described second determines subelement, is also used to based on the first suspected target data, the second suspected target data
With doubtful non-targeted data, target data is determined.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510001942.6A CN105824811B (en) | 2015-01-04 | 2015-01-04 | A kind of big data analysis method and device thereof |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510001942.6A CN105824811B (en) | 2015-01-04 | 2015-01-04 | A kind of big data analysis method and device thereof |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105824811A CN105824811A (en) | 2016-08-03 |
CN105824811B true CN105824811B (en) | 2019-07-02 |
Family
ID=56513287
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510001942.6A Active CN105824811B (en) | 2015-01-04 | 2015-01-04 | A kind of big data analysis method and device thereof |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105824811B (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1333612A (en) * | 2000-06-19 | 2002-01-30 | 阿尔卡塔尔公司 | Method for rebooting terminal connected with local area network |
CN1647052A (en) * | 2002-04-12 | 2005-07-27 | 沃达方集团有限公司 | Method ans system for distribution of encrypted data in a mobile network |
CN1698311A (en) * | 2003-01-16 | 2005-11-16 | 索尼英国有限公司 | Video/audio network |
CN103327063A (en) * | 2012-02-14 | 2013-09-25 | 谷歌公司 | User presence detection and event discovery |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2006171796A (en) * | 2000-06-02 | 2006-06-29 | Bld Oriental Kk | Content distribution system and competition implementation system using network |
JP4641848B2 (en) * | 2005-03-30 | 2011-03-02 | 富士通株式会社 | Unauthorized access search method and apparatus |
WO2008046130A1 (en) * | 2006-10-17 | 2008-04-24 | Silverbrook Research Pty Ltd | Method of delivering an advertisement from a computer system |
US20090282023A1 (en) * | 2008-05-12 | 2009-11-12 | Bennett James D | Search engine using prior search terms, results and prior interaction to construct current search term results |
-
2015
- 2015-01-04 CN CN201510001942.6A patent/CN105824811B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1333612A (en) * | 2000-06-19 | 2002-01-30 | 阿尔卡塔尔公司 | Method for rebooting terminal connected with local area network |
CN1647052A (en) * | 2002-04-12 | 2005-07-27 | 沃达方集团有限公司 | Method ans system for distribution of encrypted data in a mobile network |
CN1698311A (en) * | 2003-01-16 | 2005-11-16 | 索尼英国有限公司 | Video/audio network |
CN103327063A (en) * | 2012-02-14 | 2013-09-25 | 谷歌公司 | User presence detection and event discovery |
Also Published As
Publication number | Publication date |
---|---|
CN105824811A (en) | 2016-08-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Blough et al. | A statistical analysis of the long-run node spatial distribution in mobile ad hoc networks | |
CN106156941B (en) | A kind of user credit scoring optimization method and device | |
CN110011876B (en) | Sketch network measurement method based on reinforcement learning | |
CN110417607B (en) | Flow prediction method, device and equipment | |
CN102724219B (en) | A network data computer processing method and a system thereof | |
CN103426042B (en) | The group technology of social networks and system | |
CN111506485B (en) | Feature binning method, device, equipment and computer-readable storage medium | |
CN104657372A (en) | Page operation data processing method and device | |
CN103702360B (en) | A kind of method and device of the data rate for determining service access port | |
CN106375975B (en) | A kind of conflicting policies test method and device | |
CN103037424B (en) | Evaluation method and device of the 3rd generation telecommunication (3G) network coverage | |
Yeshwanth et al. | Evolutionary churn prediction in mobile networks using hybrid learning | |
CN105281925A (en) | Network service user group dividing method and device | |
CN104217088B (en) | The optimization method and system of operator's mobile service resource | |
CN111061624A (en) | Policy execution effect determination method and device, electronic equipment and storage medium | |
CN108989581A (en) | A kind of consumer's risk recognition methods, apparatus and system | |
CN107147514A (en) | A kind of powerline network is optimized allocation of resources method and system | |
CN109005514A (en) | Earth-filling method, device, terminal device and the storage medium of customer position information | |
CN107015993A (en) | A kind of user type recognition methods and device | |
CN106681803A (en) | Task scheduling method and server | |
CN107659982B (en) | Wireless network access point classification method and device | |
CN105824811B (en) | A kind of big data analysis method and device thereof | |
CN109429282B (en) | Frequency point configuration method and device | |
CN105656709A (en) | Method and apparatus for predicting capacity of packet domain network | |
Chen et al. | Optimal transport on supply-demand networks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |