CN105824811B - A kind of big data analysis method and device thereof - Google Patents

A kind of big data analysis method and device thereof Download PDF

Info

Publication number
CN105824811B
CN105824811B CN201510001942.6A CN201510001942A CN105824811B CN 105824811 B CN105824811 B CN 105824811B CN 201510001942 A CN201510001942 A CN 201510001942A CN 105824811 B CN105824811 B CN 105824811B
Authority
CN
China
Prior art keywords
data
rule
group
target data
suspected target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510001942.6A
Other languages
Chinese (zh)
Other versions
CN105824811A (en
Inventor
黄庆荣
谢志崇
魏建荣
彭家华
郑志欢
林恪
陈钰铖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Group Fujian Co Ltd
Original Assignee
China Mobile Group Fujian Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Group Fujian Co Ltd filed Critical China Mobile Group Fujian Co Ltd
Priority to CN201510001942.6A priority Critical patent/CN105824811B/en
Publication of CN105824811A publication Critical patent/CN105824811A/en
Application granted granted Critical
Publication of CN105824811B publication Critical patent/CN105824811B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The embodiment of the invention discloses a kind of big data analysis methods, comprising: first group of data and second group of data based on input obtain at least two characteristic informations for meeting preset condition;First group of data and second group of data are the data in the first communication network;First group of data meet the first preset rules;Second group of data meet the second preset rules;According at least two characteristic information, first group of data and second group of data are analyzed, determine the first rule-like and the second rule-like;According to first rule-like and the second rule-like, the target data for meeting first preset rules is determined in the third group data of input;The third group data are the data in other communication networks in addition to first communication network.The embodiment of the invention also discloses a kind of big data analysis devices.

Description

A kind of big data analysis method and device thereof
Technical field
The present invention relates to the communication technology more particularly to a kind of big data analysis method and device thereof.
Background technique
With fourth generation mobile communication technology (4 Generation mobile communication of 4G, the Technology commercialization), major operator's competition are fierce increasingly;The infiltration of the reflux work and 4G terminal of rete mirabile high-value user Work plays an important role for the development of mobile operator thoroughly;So the identification of rete mirabile high-value user seems most important.
Industry has the method analyzed user behavior and modeled to determine user property at present, still, existing side In method, the quantity of statistics rete mirabile user is generally laid particular emphasis on, the identification of rete mirabile user and the terminal of rete mirabile user are not laid particular emphasis on The identification of type.
Summary of the invention
To solve existing technical problem, the embodiment of the invention provides a kind of big data analysis method and its dresses It sets, can determine the target data for meeting preset rules in rete mirabile data according to Home Network data rule.
The technical solution of the embodiment of the present invention is achieved in that the embodiment of the invention provides a kind of big data analysis sides Method, which comprises
First group of data and second group of data based on input obtain at least two characteristic informations for meeting preset condition; First group of data and second group of data are the data in the first communication network;It is default that first group of data meet first Rule;Second group of data meet the second preset rules;
According at least two characteristic information, first group of data and second group of data are analyzed, are determined First rule-like and the second rule-like;
According to first rule-like and the second rule-like, determine to meet described first in the third group data of input The target data of preset rules;The third group data are the number in other communication networks in addition to first communication network According to.
It is described according at least two characteristic information in above scheme, to first group of data and second group of data It is analyzed, determines the first rule-like and the second rule-like, comprising:
Using logistic regression algorithm, according at least two characteristic information, to first group of data and described second Group data are analyzed, and determine the first rule-like;
Using decision Tree algorithms, according at least two characteristic information, to first group of data and second group described Data are analyzed, and determine the second rule-like.
It is described to use decision Tree algorithms in above scheme, according at least two characteristic information, to first group of number It is analyzed according to second group of data, determines the second rule-like, comprising:
Using decision Tree algorithms, according at least two characteristic information, to first group of data and second group described Data are analyzed, and determine N number of rule;The N is the positive integer more than or equal to 2;
In N number of rule, the second rule-like for meeting third preset rules is determined.
It is described according to first rule-like and the second rule-like in above scheme, in the third group data of input really Make the target data for meeting first preset rules, comprising:
Respectively according to first rule-like and the second rule-like, the third group data of input are analyzed, obtain One suspected target data and the second suspected target data;
It determines to meet first preset rules based on the first suspected target data and the second suspected target data Target data.
In above scheme, second rule-like includes: first kind sub-rule;The first kind sub-rule meets described One preset rules;
Accordingly, described respectively according to first rule-like and the second rule-like, the third group data of input are carried out Analysis, obtains the first suspected target data and the second suspected target data, comprising:
According to first rule-like, the third group data of input are analyzed, the first suspected target data are obtained;
According to the first kind sub-rule, the third group data of input are analyzed, the second suspected target data are obtained.
In above scheme, second rule-like further include: the second class sub-rule;The second class sub-rule meets second Preset rules;The method also includes:
According to the second class sub-rule, the first suspected target data and the second suspected target data are carried out Analysis, obtains doubtful non-targeted data;
It is accordingly, described to determine target data based on the first suspected target data and the second suspected target data, Include:
Based on the first suspected target data, the second suspected target data and doubtful non-targeted data, target is determined Data.
The embodiment of the invention also provides a kind of big data analysis device, described device includes:
Acquiring unit, for first group of data and second group of data based on input, acquisition meets preset condition at least Two characteristic informations;First group of data and second group of data are the data in the first communication network;First group of number According to meeting the first preset rules;Second group of data meet the second preset rules;
Analytical unit, for according at least two characteristic information, to first group of data and second group of data into Row analysis, determines the first rule-like and the second rule-like;
Determination unit, for being determined in the third group data of input according to first rule-like and the second rule-like Meet the target data of first preset rules out;The third group data are other in addition to first communication network Data in communication network.
In above scheme, the analytical unit includes:
First analysis subelement, for using logistic regression algorithm, according at least two characteristic information, to described the One group of data and second group of data are analyzed, and determine the first rule-like;
Second analysis subelement, for using decision Tree algorithms, according at least two characteristic information, to described first Group data and second group of data are analyzed, and determine the second rule-like.
In above scheme, the second analysis subelement is also used to using decision Tree algorithms, special according to described at least two Reference breath, analyzes first group of data and second group of data, determines N number of rule;The N be more than or equal to 2 positive integer;
It is also used in N number of rule, determines the second rule-like for meeting third preset rules.
In above scheme, the determination unit, comprising:
First determines subelement, is used for respectively according to first rule-like and the second rule-like, to the third group of input Data are analyzed, and the first suspected target data and the second suspected target data are obtained;
Second determines subelement, full for being determined based on the first suspected target data and the second suspected target data The target data of foot first preset rules.
In above scheme, second rule-like includes: first kind sub-rule;The first kind sub-rule meets described One preset rules;Accordingly,
Described first determines subelement, is also used to divide the third group data of input according to first rule-like Analysis, obtains the first suspected target data;
It is also used to analyze the third group data of input according to the first kind sub-rule, obtain the second doubtful mesh Mark data.
In above scheme, second rule-like further include: the second class sub-rule;The second class sub-rule meets second Preset rules;
Described first determines subelement, is also used to according to the second class sub-rule, to the first suspected target data It is analyzed with the second suspected target data, obtains doubtful non-targeted data;
Accordingly, it described second determines subelement, is also used to based on the first suspected target data, the second suspected target Data and doubtful non-targeted data, determine target data.
Big data analysis method and device thereof provided by the embodiment of the present invention, can be at first group of the first communication network At least two characteristic informations are determined in data and second group of data, and use two kinds of algorithms of different, are based on described at least two Characteristic information determines the first rule-like for being directed to algorithms of different and the second rule-like, in this way, passing through first rule-like With the second rule-like, the third group data in other communication networks in addition to first communication network are analyzed, with The target data for meeting preset rules is determined in the third group data, and therefore, the embodiment of the present invention can be realized foundation Home Network data rule determines the purpose for the target data for meeting preset rules in rete mirabile data.
Detailed description of the invention
Fig. 1 is the implementation process schematic diagram of big data analysis of embodiment of the present invention method;
Fig. 2 is the concrete structure schematic diagram of big data analysis of embodiment of the present invention device;
Fig. 3 is the concrete structure schematic diagram of analytical unit of the embodiment of the present invention;
Fig. 4 is the concrete structure schematic diagram of determination unit of the embodiment of the present invention;
Fig. 5 is the flow diagram of the specific implementation of big data analysis of embodiment of the present invention method.
Specific embodiment
In order to more fully hereinafter understand the features of the present invention and technology contents, with reference to the accompanying drawing to reality of the invention It is now described in detail, appended attached drawing purposes of discussion only for reference, is not used to limit the present invention.
Embodiment one
Fig. 1 is the implementation process schematic diagram of big data analysis of embodiment of the present invention method;As shown in Figure 1, the method packet It includes:
Step 101: first group of data and second group of data based on input obtain at least two spies for meeting preset condition Reference breath;First group of data and second group of data are the data in the first communication network;First group of data meet First preset rules;Second group of data meet the second preset rules;
In the present embodiment, first preset rules can be in the communication of the corresponding user of the first data in communication network Device type belongs to the rule of the first kind;Second preset rules can be in the corresponding use of the first data in communication network The communication device types at family are not belonging to the rule of the first kind;In this way, in first communication network, first group of data Corresponding communication device types are the first kind;The corresponding communication device types of second group of data are not the first kind Type;The characterization rules of the data as corresponding to different communication device type are different, by first group of data and second The group respective characterization rules of data are analyzed, and are capable of determining that M characteristic information for meeting preset condition;Based on the M Characteristic information analyzes data, can estimate the features such as the corresponding communication device types of data;Based on the above process, originally Inventive embodiments can determine that communication is set according to the characteristic information in first communication network from the mass data of rete mirabile Standby type belongs to the data of the first kind, lays the foundation for big data analysis;Here, the M is the positive integer more than or equal to 2.
In the present embodiment, the characteristic information is specially the key variables index for meeting preset condition, using different calculations Method analyzes the big data in the first communication network by key variables index, namely to first group of data and second group Data are analyzed, in this way, to determine that rule lays the foundation in the big data of the first communication network.
In the present embodiment, the preset condition includes but is not limited to: greater than condition equal to the first number of users, communication pair The communication device types of elephant are the condition etc. of the first kind.
Step 102: according at least two characteristic information, first group of data and second group of data being divided Analysis, determines the first rule-like and the second rule-like;
It is right using algorithms of different according at least two characteristic informations determined in the first communication network in the present embodiment First group of data and second group of data are analyzed, and then determine the first rule-like based on first communication network With the second rule-like.
In practical applications, when carrying out data analysis to big data, different algorithms is usually selected, in this way, dividing to improve Analyse the accuracy of result;Therefore, the present embodiment also selects two different algorithms to the first group of data and second group of number of input According to being analyzed.
It is described according at least two characteristic information in above scheme, to first group of data and second group of data It is analyzed, determines the first rule-like and the second rule-like, comprising:
Using logistic regression algorithm, according at least two characteristic information, to first group of data and described second Group data are analyzed, and determine the first rule-like;
Using decision Tree algorithms, according at least two characteristic information, to first group of data and second group described Data are analyzed, and determine the second rule-like.
It is described to use decision Tree algorithms in above scheme, according at least two characteristic information, to first group of number It is analyzed according to second group of data, determines the second rule-like, comprising:
Using decision Tree algorithms, according at least two characteristic information, to first group of data and second group described Data are analyzed, and determine N number of rule;The N is the positive integer more than or equal to 2;
In N number of rule, the second rule-like for meeting third preset rules is determined.
In the present embodiment, since the number for the characteristic information determined in step 101 is different, so that using decision Tree algorithms The number for the rule determined is different, i.e. N is different;Therefore, the value of N is limited to the number of the characteristic information.
In the present embodiment, second rule-like is one to be referred to as, and is in N number of rule, all meets the default rule of third Rule then is referred to as, and therefore, does not refer to an ad hoc rules.
Step 103: according to first rule-like and the second rule-like, determining to meet in the third group data of input The target data of first preset rules;The third group data are other communication networks in addition to first communication network Data in network.
In the present embodiment, can by the first rule-like and the second rule-like determined in the first communication network, In the mass data in other communication networks in addition to first communication network, determine the mesh for meeting the first preset rules Mark data determine that the communication device types of user belong to the number of targets of the first kind that is, in the data of other communication networks According in this way, realization determines the mesh for the target data for meeting preset rules based on data rule in Home Network in rete mirabile data 's.
It is described according to first rule-like and the second rule-like in above scheme, in the third group data of input really Make the target data for meeting first preset rules, comprising:
Respectively according to first rule-like and the second rule-like, the third group data of input are analyzed, obtain One suspected target data and the second suspected target data;
It determines to meet first preset rules based on the first suspected target data and the second suspected target data Target data.
In the present embodiment, the first suspected target data are data corresponding with the first rule-like, that is, pass through the first kind Rule, the doubtful mesh for the first preset rules of satisfaction determined in other communication networks in addition to first communication network Mark data;The second suspected target data are data corresponding with the second rule-like, i.e., by the second rule-like, except described The suspected target data for the first preset rules of satisfaction determined in other communication networks except first communication network.
In above scheme, second rule-like includes: first kind sub-rule;The first kind sub-rule meets described One preset rules;
Accordingly, described respectively according to first rule-like and the second rule-like, the third group data of input are carried out Analysis, obtains the first suspected target data and the second suspected target data, comprising:
According to first rule-like, the third group data of input are analyzed, the first suspected target data are obtained;
According to the first kind sub-rule, the third group data of input are analyzed, the second suspected target data are obtained.
In the present embodiment, since second rule-like is the rule determined using decision Tree algorithms, pass through Two rule-likes are capable of determining that the second suspected target data for meeting the first preset rules, and meet the doubtful of the second preset rules Non-targeted data;That is, second rule-like includes: first kind sub-rule and the second class sub-rule;Pass through the first kind Rule is capable of determining that the second suspected target data for meeting the first preset rules;It, can be true by the second class sub-rule Make the doubtful non-targeted data for meeting the second preset rules;Therefore, the present embodiment also need from the first suspected target data and Doubtful non-targeted data are rejected in second suspected target data, to determine final goal data.
In the present embodiment, the first kind sub-rule is the rule for meeting the first preset rules;The second class sub-rule For the rule for being unsatisfactory for first preset rules;It also is the rule for meeting second preset rules;When second class Sub-rule is when being unsatisfactory for the rule of first preset rules, and the doubtful non-targeted data are a kind of interference data;Therefore, The doubtful non-targeted data are referred to as interference data.
In above scheme, second rule-like further include: the second class sub-rule;The second class sub-rule meets second Preset rules;The method also includes:
According to the second class sub-rule, the first suspected target data and the second suspected target data are carried out Analysis, obtains doubtful non-targeted data;
It is accordingly, described to determine target data based on the first suspected target data and the second suspected target data, Include:
Based on the first suspected target data, the second suspected target data and doubtful non-targeted data, target is determined Data.
To realize the above method, the embodiment of the invention also provides a kind of big data analysis devices, as shown in Fig. 2, described Device includes:
Acquiring unit 21, for first group of data and second group of data based on input, acquisition meets preset condition extremely Few two characteristic informations;First group of data and second group of data are the data in the first communication network;Described first group Data meet the first preset rules;Second group of data meet the second preset rules;
Analytical unit 22 is used for according at least two characteristic information, to first group of data and second group of data It is analyzed, determines the first rule-like and the second rule-like;
Determination unit 23 is used for according to first rule-like and the second rule-like, in the third group data of input really Make the target data for meeting first preset rules;The third group data are its in addition to first communication network Data in his communication network.
In above scheme, as shown in figure 3, the analytical unit 22 includes:
First analysis subelement 221, for using logistic regression algorithm, according at least two characteristic information, to institute It states first group of data and second group of data is analyzed, determine the first rule-like;
Second analysis subelement 222, for using decision Tree algorithms, according at least two characteristic information, to described First group of data and second group of data are analyzed, and determine the second rule-like.
In above scheme, the second analysis subelement 222 is also used to using decision Tree algorithms, according to described at least two A characteristic information analyzes first group of data and second group of data, determines N number of rule;The N be greater than Positive integer equal to 2;
It is also used in N number of rule, determines the second rule-like for meeting third preset rules.
In above scheme, as shown in figure 4, the determination unit 23, comprising:
First determines subelement 231, is used for respectively according to first rule-like and the second rule-like, to the third of input Group data are analyzed, and the first suspected target data and the second suspected target data are obtained;
Second determines subelement 232, for being determined based on the first suspected target data and the second suspected target data Meet the target data of first preset rules out.
In above scheme, second rule-like includes: first kind sub-rule;The first kind sub-rule meets described One preset rules;Accordingly,
Described first determines subelement 231, is also used to carry out the third group data of input according to first rule-like Analysis, obtains the first suspected target data;
It is also used to analyze the third group data of input according to the first kind sub-rule, obtain the second doubtful mesh Mark data.
In above scheme, second rule-like further include: the second class sub-rule;The second class sub-rule meets second Preset rules;
Described first determines subelement 231, is also used to according to the second class sub-rule, the first suspected target data It is analyzed with the second suspected target data, obtains doubtful non-targeted data;
Accordingly, it described second determines subelement 232, is also used to based on the first suspected target data, second doubtful Target data and doubtful non-targeted data, determine target data.
The acquiring unit 21, analytical unit 22 and determination unit 23 can be run on computer, can be by being located at meter Central processing unit (CPU) or microprocessor (MPU) or digital signal processor (DSP) or programmable gate array on calculation machine (FPGA) it realizes.
Embodiment two
First software, such as IMESSAGE software refer to the software that short message is sent between the user of first kind terminal built-in, The software can be such that short message directly sends from the end GPRS, save the short-message fee of the user using first kind terminal;Therefore, The usage amount that short message may be greatly reduced using the first kind terminal user of the first software forms short message black hole phenomenon, The present embodiment is based on said short message black hole phenomenon, determines that terminal type is the user of the first kind in rete mirabile.
The present embodiment mainly utilizes the existing communication data through subsystem, and analysis Home Network uses the first kind of the first software The characteristics of crowd of the communication behavior of terminal user and its relationship cycle, identifies that rete mirabile has above-mentioned communication behavior, Yi Jiqi Relationship cycle crowd meets data namely the user of These characteristics, finally to determine terminal type as the first kind in rete mirabile User, with power-assisted in the reflux work and marketing strategy of the rete mirabile high value customer of operator.
Specifically, the present embodiment is mainly based on user's relationship cycle model, by analysis Home Network first kind terminal It is accustomed to feature using the customer voice relationship cycle of the first software and short message relationship cycle etc., in rete mirabile a large number of users, analyzes the The user group of one type terminal user, so analyze a certain user of rete mirabile whether be first kind terminal user probability, with The data information with reference value is provided for operator.
Fig. 5 is the flow diagram of the specific implementation of big data analysis of embodiment of the present invention method;Carrying out big data point Before analysis, it is thus necessary to determine that go out first group of data and second group of data;Specifically, determine there is first in the first communication network First group of data of data volume and second group of data with the first data volume;Wherein, each data in first group of data Corresponding user device type is the first kind;The corresponding user device type of second group of data is the non-first kind;Such as Shown in Fig. 5, which comprises
Step 501: in first group of data and second group of data, respectively being corresponded in conjunction with first group of data and second group of data The characterization rules of relationship cycle of user, whether voice and the characterization rules of short message, contact use the first kind in relationship cycle The characterization rules etc. of type terminal select M characteristic information;Wherein, M is the positive integer more than or equal to 2;
Here, the characteristic information is also referred to as key variables index.
Step 502: logistic regression algorithm is used, according to the M characteristic information, to first group of data and second group Data are analyzed, and the first rule-like for meeting the first preset rules is simulated;
Here, first rule-like can be logistic regression formula;First preset rules are type of user terminal For the rule of the first kind.
It is described that first group of data and second group of data are analyzed in the present embodiment, it is pre- to simulate satisfaction first If the first rule-like of rule, comprising:
First group of data and second group of data are carried out using logistic regression algorithm based on the M characteristic information Analysis simulates the first rule-like for meeting the first preset rules.
Step 503: determining third group data, according to first rule-like, calculate each number in the third group data According to probability, to determine the first suspected target data;The third group data be and the user in first communication network Data corresponding to users being communicated, in other communication networks;
Here, described according to first rule-like, the probability of each data in the third group data is calculated, with determination First suspected target data out further comprise:
According to first rule-like, the probability of each data in the third group data is calculated;
According to data traffic requirement, the corresponding pre-set user number of the logistic regression grade of logistic regression algorithm, described the In the corresponding probability of each data in three groups of data, determine that probability is more than or equal to the data of preset threshold, and probability is greater than Equal to preset threshold data as the first suspected target data.
Step 504: C5 decision Tree algorithms are used, according to the M characteristic information, to first group of data and described the Two groups of data are analyzed, and determine m1 rule A and m2 rule B;
Step 505: according to regular A number of users corresponding with rule B and confidence level, regular A and rule B are screened, To determine first kind sub-rule in the rule A, the second class sub-rule is determined in the rule B;
Here, the first kind sub-rule meets first preset rules;The second class sub-rule meets described Two preset rules;Described m1, m2 are the positive integer more than or equal to 1.
Specifically, when the number of users of first group of data and second group of data is 10W, confidence is filtered out from regular A Degree is greater than the rule that 85%, number of users is greater than 2W, is determined as first kind sub-rule;Confidence level is filtered out from regular B to be greater than 90%, number of users is greater than the rule of 1.8W, is determined as the second class sub-rule;
In the present embodiment, the first kind sub-rule and the second class sub-rule belong to the second rule-like.
Step 506: according to the first kind sub-rule, the third group data being analyzed, determine that second is doubtful Target data;
Step 507: the intersection data of the first suspected target data and the second suspected target data are determined, as third Suspected target data;
Step 508: rejecting the data for meeting the second class sub-rule in the third suspected target data, remaining third is doubted Like target data as target data.
The embodiment of the present invention can determine key in the first group of data and second group of data in the first communication network Variable index, i.e. characteristic information;And logistic regression algorithm and decision Tree algorithms are respectively adopted to first group of data and second Group data are analyzed, and determine the first rule-like corresponding with the logistic regression algorithm, and with the decision Tree algorithms pair The second rule-like answered;Wherein, second rule-like includes first kind sub-rule and the second class sub-rule;Then, respectively according to The third group data in rete mirabile are analyzed according to first rule-like and first kind sub-rule, determine the first suspected target Data and the second suspected target data;Since first rule-like meets the first preset rules;The first kind sub-rule Meet first preset rules;And the second class sub-rule meets second preset rules, therefore, takes the described first doubtful mesh After third suspected target data are determined in the intersection of mark data and the second suspected target data, in the third suspected target data It is middle to reject the data for meeting the second class sub-rule, i.e., doubtful non-targeted data are rejected in the third suspected target data with most Target data is obtained eventually, and the target data is to determine to meet first in rete mirabile data in advance according to Home Network data rule If the target data of rule.
It should be understood by those skilled in the art that, the embodiment of the present invention can provide as method, system or computer program Product.Therefore, the shape of hardware embodiment, software implementation or embodiment combining software and hardware aspects can be used in the present invention Formula.Moreover, the present invention, which can be used, can use storage in the computer that one or more wherein includes computer usable program code The form for the computer program product implemented on medium (including but not limited to magnetic disk storage and optical memory etc.).
The present invention be referring to according to the method for the embodiment of the present invention, the process of equipment (system) and computer program product Figure and/or block diagram describe.It should be understood that every one stream in flowchart and/or the block diagram can be realized by computer program instructions The combination of process and/or box in journey and/or box and flowchart and/or the block diagram.It can provide these computer programs Instruct the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce A raw machine, so that being generated by the instruction that computer or the processor of other programmable data processing devices execute for real The device for the function of being specified in present one or more flows of the flowchart and/or one or more blocks of the block diagram.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates, Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one The step of function of being specified in a box or multiple boxes.
The above is only the embodiment of the embodiment of the present invention, it is noted that for the ordinary skill of the art For personnel, without departing from the principles of the embodiments of the present invention, can also make several improvements and retouch, these improve and Retouching also should be regarded as the protection scope of the embodiment of the present invention.

Claims (12)

1. a kind of big data analysis method, which is characterized in that the described method includes:
First group of data and second group of data based on input obtain at least two characteristic informations for meeting preset condition;It is described First group of data and second group of data are the data in the first communication network;First group of data meet the first default rule Then;Second group of data meet the second preset rules;
According at least two characteristic information, first group of data and second group of data are analyzed, determine first Rule-like and the second rule-like;
According to first rule-like and the second rule-like, determine that meeting described first presets in the third group data of input The target data of rule;The third group data are the data in other communication networks in addition to first communication network.
2. the method according to claim 1, wherein described according at least two characteristic information, to described First group of data and second group of data are analyzed, and determine the first rule-like and the second rule-like, comprising:
Using logistic regression algorithm, according at least two characteristic information, to first group of data and second group of number According to being analyzed, the first rule-like is determined;
Using decision Tree algorithms, according at least two characteristic information, to first group of data and second group of data It is analyzed, determines the second rule-like.
3. according to the method described in claim 2, it is characterized in that, described use decision Tree algorithms, according to described at least two Characteristic information analyzes first group of data and second group of data, determines the second rule-like, comprising:
Using decision Tree algorithms, according at least two characteristic information, to first group of data and second group of data It is analyzed, determines N number of rule;The N is the positive integer more than or equal to 2;
In N number of rule, the second rule-like for meeting third preset rules is determined.
4. according to claim 1 or 3 described in any item methods, which is characterized in that described according to first rule-like and Two rule-likes determine the target data for meeting first preset rules in the third group data of input, comprising:
Respectively according to first rule-like and the second rule-like, the third group data of input are analyzed, first is obtained and doubts Like target data and the second suspected target data;
The mesh for meeting first preset rules is determined based on the first suspected target data and the second suspected target data Mark data.
5. according to the method described in claim 4, it is characterized in that, second rule-like includes: first kind sub-rule;It is described First kind sub-rule meets first preset rules;
It is accordingly, described that the third group data of input are analyzed respectively according to first rule-like and the second rule-like, Obtain the first suspected target data and the second suspected target data, comprising:
According to first rule-like, the third group data of input are analyzed, the first suspected target data are obtained;
According to the first kind sub-rule, the third group data of input are analyzed, the second suspected target data are obtained.
6. according to the method described in claim 5, it is characterized in that, second rule-like further include: the second class sub-rule;Institute It states the second class sub-rule and meets the second preset rules;The method also includes:
According to the second class sub-rule, the first suspected target data and the second suspected target data are divided Analysis, obtains doubtful non-targeted data;
It is accordingly, described to determine target data based on the first suspected target data and the second suspected target data, comprising:
Based on the first suspected target data, the second suspected target data and doubtful non-targeted data, target data is determined.
7. a kind of big data analysis device, which is characterized in that described device includes:
Acquiring unit, for first group of data and second group of data based on input, acquisition meets at least two of preset condition Characteristic information;First group of data and second group of data are the data in the first communication network;First group of data are full The first preset rules of foot;Second group of data meet the second preset rules;
Analytical unit, for dividing first group of data and second group of data according at least two characteristic information Analysis, determines the first rule-like and the second rule-like;
Determination unit, it is full for being determined in the third group data of input according to first rule-like and the second rule-like The target data of foot first preset rules;The third group data are other communications in addition to first communication network Data in network.
8. device according to claim 7, which is characterized in that the analytical unit includes:
First analysis subelement, for using logistic regression algorithm, according at least two characteristic information, to described first group Data and second group of data are analyzed, and determine the first rule-like;
Second analysis subelement, for using decision Tree algorithms, according at least two characteristic information, to first group of number It is analyzed according to second group of data, determines the second rule-like.
9. device according to claim 8, which is characterized in that the second analysis subelement is also used to using decision tree Algorithm is analyzed first group of data and second group of data, is determined according at least two characteristic information N number of rule;The N is the positive integer more than or equal to 2;
It is also used in N number of rule, determines the second rule-like for meeting third preset rules.
10. device according to any one of claims 7 to 9, which is characterized in that the determination unit, comprising:
First determines subelement, is used for respectively according to first rule-like and the second rule-like, to the third group data of input It is analyzed, obtains the first suspected target data and the second suspected target data;
Second determines subelement, for determining to meet institute based on the first suspected target data and the second suspected target data State the target data of the first preset rules.
11. device according to claim 10, which is characterized in that second rule-like includes: first kind sub-rule;Institute It states first kind sub-rule and meets first preset rules;Accordingly,
Described first determines subelement, is also used to analyze the third group data of input according to first rule-like, obtain To the first suspected target data;
It is also used to analyze the third group data of input according to the first kind sub-rule, obtain the second suspected target number According to.
12. device according to claim 11, which is characterized in that second rule-like further include: the second class sub-rule; The second class sub-rule meets the second preset rules;
Described first determines subelement, is also used to according to the second class sub-rule, to the first suspected target data and institute It states the second suspected target data to be analyzed, obtains doubtful non-targeted data;
Accordingly, it described second determines subelement, is also used to based on the first suspected target data, the second suspected target data With doubtful non-targeted data, target data is determined.
CN201510001942.6A 2015-01-04 2015-01-04 A kind of big data analysis method and device thereof Active CN105824811B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510001942.6A CN105824811B (en) 2015-01-04 2015-01-04 A kind of big data analysis method and device thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510001942.6A CN105824811B (en) 2015-01-04 2015-01-04 A kind of big data analysis method and device thereof

Publications (2)

Publication Number Publication Date
CN105824811A CN105824811A (en) 2016-08-03
CN105824811B true CN105824811B (en) 2019-07-02

Family

ID=56513287

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510001942.6A Active CN105824811B (en) 2015-01-04 2015-01-04 A kind of big data analysis method and device thereof

Country Status (1)

Country Link
CN (1) CN105824811B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1333612A (en) * 2000-06-19 2002-01-30 阿尔卡塔尔公司 Method for rebooting terminal connected with local area network
CN1647052A (en) * 2002-04-12 2005-07-27 沃达方集团有限公司 Method ans system for distribution of encrypted data in a mobile network
CN1698311A (en) * 2003-01-16 2005-11-16 索尼英国有限公司 Video/audio network
CN103327063A (en) * 2012-02-14 2013-09-25 谷歌公司 User presence detection and event discovery

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006171796A (en) * 2000-06-02 2006-06-29 Bld Oriental Kk Content distribution system and competition implementation system using network
JP4641848B2 (en) * 2005-03-30 2011-03-02 富士通株式会社 Unauthorized access search method and apparatus
WO2008046130A1 (en) * 2006-10-17 2008-04-24 Silverbrook Research Pty Ltd Method of delivering an advertisement from a computer system
US20090282023A1 (en) * 2008-05-12 2009-11-12 Bennett James D Search engine using prior search terms, results and prior interaction to construct current search term results

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1333612A (en) * 2000-06-19 2002-01-30 阿尔卡塔尔公司 Method for rebooting terminal connected with local area network
CN1647052A (en) * 2002-04-12 2005-07-27 沃达方集团有限公司 Method ans system for distribution of encrypted data in a mobile network
CN1698311A (en) * 2003-01-16 2005-11-16 索尼英国有限公司 Video/audio network
CN103327063A (en) * 2012-02-14 2013-09-25 谷歌公司 User presence detection and event discovery

Also Published As

Publication number Publication date
CN105824811A (en) 2016-08-03

Similar Documents

Publication Publication Date Title
Blough et al. A statistical analysis of the long-run node spatial distribution in mobile ad hoc networks
CN106156941B (en) A kind of user credit scoring optimization method and device
CN110011876B (en) Sketch network measurement method based on reinforcement learning
CN110417607B (en) Flow prediction method, device and equipment
CN102724219B (en) A network data computer processing method and a system thereof
CN103426042B (en) The group technology of social networks and system
CN111506485B (en) Feature binning method, device, equipment and computer-readable storage medium
CN104657372A (en) Page operation data processing method and device
CN103702360B (en) A kind of method and device of the data rate for determining service access port
CN106375975B (en) A kind of conflicting policies test method and device
CN103037424B (en) Evaluation method and device of the 3rd generation telecommunication (3G) network coverage
Yeshwanth et al. Evolutionary churn prediction in mobile networks using hybrid learning
CN105281925A (en) Network service user group dividing method and device
CN104217088B (en) The optimization method and system of operator's mobile service resource
CN111061624A (en) Policy execution effect determination method and device, electronic equipment and storage medium
CN108989581A (en) A kind of consumer's risk recognition methods, apparatus and system
CN107147514A (en) A kind of powerline network is optimized allocation of resources method and system
CN109005514A (en) Earth-filling method, device, terminal device and the storage medium of customer position information
CN107015993A (en) A kind of user type recognition methods and device
CN106681803A (en) Task scheduling method and server
CN107659982B (en) Wireless network access point classification method and device
CN105824811B (en) A kind of big data analysis method and device thereof
CN109429282B (en) Frequency point configuration method and device
CN105656709A (en) Method and apparatus for predicting capacity of packet domain network
Chen et al. Optimal transport on supply-demand networks

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant