CN110442762A - Big data processing method based on cloud platform big data - Google Patents
Big data processing method based on cloud platform big data Download PDFInfo
- Publication number
- CN110442762A CN110442762A CN201910728420.4A CN201910728420A CN110442762A CN 110442762 A CN110442762 A CN 110442762A CN 201910728420 A CN201910728420 A CN 201910728420A CN 110442762 A CN110442762 A CN 110442762A
- Authority
- CN
- China
- Prior art keywords
- service
- candidate item
- excavated
- data
- excavation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/903—Querying
- G06F16/90335—Query processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/906—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/602—Providing cryptographic facilities or services
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Bioethics (AREA)
- General Health & Medical Sciences (AREA)
- Computer Hardware Design (AREA)
- Computer Security & Cryptography (AREA)
- Software Systems (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The embodiment of the present application provides a kind of big data processing method, and data processing server significantly, by comprehensively considering different interestingness measure dimensions, after being clustered with all business big datas to each dimension, it can guarantee that different interestingness measure dimensions performance under different Data Mining Project scenes to be applied is more unified, improve the ability that big data is excavated, and it can be according to the characteristic information of each clustering cluster, it is dynamically determined the multiple Data Mining Projects and the corresponding data dimension to be excavated of each Data Mining Project of service to be excavated, and follow-up data excavation is carried out with this, it is poor mining effect can be likely to occur using fixed mining data dimension to avoid in the prior art, or data mining results have that accuracy is not high.
Description
Technical field
This application involves big data technical fields, at a kind of big data based on cloud platform big data
Reason method.
Background technique
At present for big datas excavation sides such as business (such as order behavior business, browsing behavior business etc.) on each line
Case, only limit uses a kind of interestingness measure dimension mostly, although the attribute of different interestingness measure dimension modes is focused in part
With the research of behavior, but it is directed to some service to be excavated, different interestingness measure dimensions wait in different Data Mining Projects
Under application scenarios, performance is different, and the ability excavated in big data is limited using limitation.Also, in entire data
It in mining process, is excavated using fixed mining data dimension mostly.However, when mining data dimension is excessive, these
Fixed mining data dimension may be unable to reach preferable mining effect, or when fixed mining data dimension is less, these
Fixed mining data dimension may be more than actually required, on the one hand can waste computing resource, on the other hand also will increase data
The Result probability not high there are accuracy.
Summary of the invention
In order at least overcome above-mentioned deficiency in the prior art, the first purpose of the application is to provide at a kind of big data
Reason method, by comprehensively considering different interestingness measure dimensions, after being clustered with all business big datas to each dimension,
It can guarantee that different interestingness measure dimensions performance under different Data Mining Project scenes to be applied is more unified, mention
The ability of tall and big data mining, and multiple numbers of service to be excavated can be dynamically determined according to the characteristic information of each clustering cluster
According to the project of excavation and the corresponding data dimension to be excavated of each Data Mining Project, and follow-up data excavation is carried out with this, it can
To avoid using fixed mining data dimension to be likely to occur in the prior art, mining effect is poor or data mining results exist
The not high problem of accuracy.
In a first aspect, the application provides a kind of big data processing method, applied to it is each corresponding to service to be excavated
The big data processing server of service server communication connection, which comprises
The business big data of multiple dimensions is obtained from each service server, and is directed to each dimension, to the dimension
All business big datas are clustered, and the clustering cluster of each dimension is obtained;
The characteristic information of the clustering cluster of each dimension, and the characteristic information of the clustering cluster according to each dimension are extracted, is determined
The corresponding data dimension to be excavated of multiple Data Mining Projects and each Data Mining Project of the service to be excavated;
It is corresponding to be excavated according to multiple Data Mining Projects of the service to be excavated and each Data Mining Project
Data dimension obtains the corresponding business procedure data of data dimension to be excavated under each Data Mining Project respectively;
According to the corresponding business procedure data of data dimension to be excavated are obtained under each Data Mining Project, obtain described
The big data Result of service to be excavated;
The big data Result of the service to be excavated is divided into multiple data segments, judges whether each data segment needs
It encrypts, the encryption data that encryption generates random key is carried out to the data segment that needs encrypt, finally sends total data section
To each corresponding service server.
In a kind of possible design of first aspect, the basis obtains number to be excavated under each Data Mining Project
According to the corresponding business procedure data of dimension, the step of obtaining the big data Result of the service to be excavated, comprising:
According to the corresponding business procedure data of data dimension to be excavated are obtained under each Data Mining Project, obtain multiple
First excavates service candidate, includes that multiple excavations service candidate item in each first excavation service candidate;
According to default excavation service candidate item retrieve table, it is the multiple first excavate service candidate determine exist with
The identical excavation service candidate item of default excavation service candidate item for including in the default excavation service candidate item retrieval table, and
The target for excavating service candidate as the multiple first excavates service candidate item, wherein the default excavation service is waited
It include that multiple default excavate service candidate item, have the every two of the first incidence relation is default to dig for identifying in option retrieval table
First incidence relation of pick service candidate item identifies and each default excavate with the second incidence relation services candidate item
Frequent episode business-level, first incidence relation and second incidence relation are respectively used to the strong pass between characterization frequent episode
Weak rigidity relationship between connection relationship and frequent episode;
According to default the first incidence relation mark excavated service candidate item retrieval table and include, each in the presence of default
It excavates in the first excavation service candidate of service candidate item and determines that there are the default excavation of the second incidence relation service is candidate
The second of item excavates service candidate;
Service candidate is excavated for each second, according to each default digging in the second excavation service candidate
Pick service candidate item corresponding frequent episode business-level in the default excavation service candidate item retrieval table, selection one default
Service candidate item is excavated as father and excavates service candidate item, other default excavations service candidate item as son and excavate service candidate
;
Service candidate item is excavated according to the father and the sub- excavation service candidate item obtains the big of the service to be excavated
Data mining results.
It is described according to each pre- in the second excavation service candidate in a kind of possible design of first aspect
If excavating service candidate item corresponding frequent episode business-level in the default excavation service candidate item retrieval table, one is selected
Default excavation service candidate item excavates service candidate item as father, other default excavation service candidate items are used as the son service of excavating to wait
The step of option, comprising:
It is taken according to each default excavation service candidate item in the second excavation service candidate in the default excavation
Corresponding frequent episode business-level in candidate item of being engaged in retrieval table selects frequent episode business-level to be greater than other default excavation services and waits
The default excavation service candidate item of option excavates service candidate item as father, and using other default excavation service candidate items as son
Excavate service candidate item.
It is described that service candidate item and the sub- excavation are excavated according to the father in a kind of possible design of first aspect
The step of service candidate item obtains the big data Result of the service to be excavated, comprising:
By the father excavate service candidate item be added to it is specified excavate in item set, the specified excavation item set include with
The father excavates the Mining Strategy that service candidate item matches;
The Mining Strategy and son excavation service candidate item ratio that service candidate item matches are excavated with the father according to described,
Multiple target are generated at random from multiple sub- excavation service candidate items excavates service candidate item;
It calculates every part of target in the multiple target excavation service candidate item and excavates service candidate item and father digging
The degree of correlation of pick service candidate item;
Service candidate item is excavated according to every part of target being calculated and the father excavates the degree of correlation of service candidate item,
Maximum target of the degree of correlation is excavated into service candidate item as more sub- excavation and services candidate item, the multiple target is dug
Remaining target excavates service candidate item and is selected in pick service candidate item, and the service of excavating of target after being selected is waited
Option set;
The genetic manipulation that service candidate item set is intersected and made a variation is excavated to target after the selection, is obtained new
Target excavate service candidate item set;
It calculates new target and excavates the degree of correlation that each target in service candidate item set excavates service candidate item, root
It is sub that the degree of correlation of each target excavation service candidate item and the comparison in service candidate item set are excavated according to new target
The degree of correlation for excavating service candidate item judges that new target excavates whether service candidate item set meets preset condition, if full
Foot then excavates service candidate item set with new target according to Mining Strategy output and the father service of excavating is waited
The big data Result of the corresponding service to be excavated of option.
In a kind of possible design of first aspect, the characteristic information of the clustering cluster according to each dimension is determined
The step of the corresponding data dimension to be excavated of multiple Data Mining Projects and each Data Mining Project of the service to be excavated
Suddenly, comprising:
Analysis obtains high contribution value tag and low contribution value tag from the characteristic information of the clustering cluster of each dimension;
Calculate the first accounting in the characteristic information of clustering cluster of the high contribution value tag in each dimension and
The second accounting in the characteristic information of clustering cluster of the low contribution value tag in each dimension;
Multiple Data Mining Projects of the service to be excavated are determined according to first accounting and second accounting;
According to the contribution margin of multiple Data Mining Projects of the service to be excavated and the service to be excavated, according to default
Data dimension corresponding relationship, determine the corresponding data dimension to be excavated of each Data Mining Project.
It is described that institute is determined according to first accounting and second accounting in a kind of possible design of first aspect
The step of stating multiple Data Mining Projects of service to be excavated, comprising:
According to the first difference and second accounting and the second setting between first accounting and the first setting value
The second difference between value determines that the first excavator factor of high contribution value tag and the second of low contribution value tag excavate system respectively
Number;
Determine that height contributes data corresponding to value tag to dig according to first excavator factor and second excavator factor
Second ratio of Data Mining Project corresponding to the first ratio of pick project and low contribution value tag;
Multiple Data Mining Projects of the service to be excavated according to first ratio and second ratio-dependent.
It is described to judge whether each data segment needs the step of encrypting packet in a kind of possible design of first aspect
It includes:
According to the accounting in the characteristic information of clustering cluster of the high contribution value tag in each data segment whether more than the
One threshold value is to judge that the data segment needs to encrypt, does not otherwise encrypt.
Second aspect, the embodiment of the present application provide a kind of big data processing server, including processor, memory and network
Interface.Wherein, can be connected by bus system between memory, network interface processor.Network interface is used to receive message,
Memory is used to execute program, instruction or the code in memory for storing program, instruction or code, processor, to complete
Performed operation in any possible design method of above-mentioned first aspect or first aspect.
The third aspect, the embodiment of the present application provide a kind of computer readable storage medium, in computer readable storage medium
It is stored with instruction, when run on a computer, so that computer executes above-mentioned first aspect or any of first aspect can
Method in the design method of energy.
Based on any one above-mentioned aspect, the application is by comprehensively considering different interestingness measure dimensions, to each dimension
After all business big datas of degree are clustered, it is ensured that different interestingness measure dimensions are waited in different Data Mining Projects
Performance is more unified under application scenarios, improves the ability that big data is excavated, and can believe according to the feature of each clustering cluster
Breath, is dynamically determined the multiple Data Mining Projects and the corresponding data dimension to be excavated of each Data Mining Project of service to be excavated
Degree, and follow-up data excavation is carried out with this, excavation can be likely to occur to avoid the fixed mining data dimension of use in the prior art
Effect is poor or data mining results have that accuracy is not high.In addition, by the way that the big data Result is divided
Multiple data segments are segmented into, so as to avoid huge workload, and can protect important data.
Detailed description of the invention
Technical solution in ord to more clearly illustrate embodiments of the present application, below will be to needed in the embodiment attached
Figure is briefly described, it should be understood that the following drawings illustrates only some embodiments of the application, therefore is not construed as pair
The restriction of range for those of ordinary skill in the art without creative efforts, can also be according to this
A little attached drawings obtain other relevant attached drawings.
Fig. 1 is the application scenarios schematic diagram of big data processing method provided by the embodiments of the present application;
Fig. 2 is the flow diagram of big data processing method provided by the embodiments of the present application;
Fig. 3 is that the process for each sub-steps that step S120 includes in a kind of possible embodiment shown in Fig. 2 is shown
It is intended to;
Fig. 4 is that the process for each sub-steps that step S140 includes in a kind of possible embodiment shown in Fig. 2 is shown
It is intended to;
Fig. 5 is provided by the embodiments of the present application for executing the structural representation frame of the big data processing server of the above method
Figure.
Specific embodiment
The application is specifically described with reference to the accompanying drawings of the specification, the concrete operation method in embodiment of the method can also
To be applied in Installation practice or system embodiment.In the description of the present application, unless otherwise indicated, "at least one" includes
It is one or more." multiple " refer to two or more.For example, at least one of A, B and C, comprising: individualism A, list
Solely there are B, exist simultaneously A and B, exist simultaneously A and C, exist simultaneously B and C, and exist simultaneously A, B and C.In this application,
"/" indicate or the meaning, for example, A/B can indicate A or B;"and/or" herein is only a kind of description affiliated partner
Incidence relation indicates may exist three kinds of relationships, for example, A and/or B, can indicate: individualism A exists simultaneously A and B,
These three situations of individualism B.
Referring to Fig. 1, being the application scenarios schematic diagram of big data processing method provided by the embodiments of the present application.The present embodiment
In, which may include big data processing server 100 and communicates to connect with the big data processing server 100 more
A service server 200.Wherein, big data processing server 100 can provide data mining for multiple service servers 200
Service.Each service server 200, which can be, is individually performed business on each line, such as order business, transaction business etc..
Fig. 2 is the flow diagram of big data processing method provided by the embodiments of the present application.In the present embodiment, the big data
Processing method can the big data processing server 100 as shown in Fig. 1 execute, the big data processing method is carried out below detailed
It is thin to introduce.
Step S110 obtains the business big data of multiple dimensions from each service server 200, and is directed to each dimension
Degree, clusters all business big datas of the dimension, obtains the clustering cluster of each dimension.
In the present embodiment, service to be excavated be can be according to the actually determined excavation service of user demand, specifically can root
Service server 200 associated there is selected according to the setting of user, then obtains multiple dimensions from these service servers 200
The business big data of degree, and it is directed to each dimension, all business big datas of the dimension are clustered, to obtain each dimension
The clustering cluster of degree.
Step S120 extracts the characteristic information of the clustering cluster of each dimension, and the feature of the clustering cluster according to each dimension
Information determines multiple Data Mining Projects of the service to be excavated and the corresponding data to be excavated of each Data Mining Project
Dimension.
In the present embodiment, for example, windowing process can be carried out by the clustering cluster first to each dimension;It will be in window
The clustering cluster of each dimension is input to the characteristic information that the clustering cluster of each dimension is calculated in CCIPCA algorithm.
Step S130, it is corresponding according to multiple Data Mining Projects of the service to be excavated and each Data Mining Project
Data dimension to be excavated, the corresponding business procedure number of data dimension to be excavated is obtained under each Data Mining Project respectively
According to.
In the present embodiment, business procedure data can include but is not limited to the business historical data under data dimension to be excavated
The historical data currently generated in real time.
Step S140, according to obtaining the corresponding business procedure number of data dimension to be excavated under each Data Mining Project
According to obtaining the big data Result of the service to be excavated.
The big data Result of the service to be excavated is divided into multiple data segments, judges each number by step S150
Whether need to encrypt according to section, the encryption data that encryption generates random key is carried out to the data segment that needs encrypt, it finally will be whole
Data segment is sent to each corresponding service server.
The data segment can be stored in a manner of file, that is, each file regards a data segment.In other embodiments
In, it can also be stored in same file with multiple data segments.In addition, the mode of the encryption is also unlimited, can be used existing
Encryption method.
Based on above-mentioned steps, the present embodiment is by comprehensively considering different interestingness measure dimensions, with the institute to each dimension
After thering is business big data to be clustered, it is ensured that different interestingness measure dimensions are in different Data Mining Project fields to be applied
Performance is more unified under scape, improves the ability that big data is excavated, and can be according to the characteristic information of each clustering cluster, dynamic
Determine the multiple Data Mining Projects and the corresponding data dimension to be excavated of each Data Mining Project of service to be excavated, and with
This carries out follow-up data excavation, can to avoid in the prior art using fixed mining data dimension be likely to occur mining effect compared with
Difference or data mining results have that accuracy is not high.In addition, more by the way that the big data Result to be divided into
A data segment, the huge workload generated so as to avoid encrypting total data, however can protect again important
Data.In a kind of possible design, Fig. 3 is please referred to, for step S120, can specifically include following sub-step:
Sub-step S121, from the characteristic information of the clustering cluster of each dimension analysis obtain high contribution value tag with it is low
Contribute value tag.
Sub-step S122 calculates the in the characteristic information of clustering cluster of the high contribution value tag in each dimension
The second accounting in the characteristic information of the clustering cluster of one accounting and the low contribution value tag in each dimension.
Sub-step S123 determines multiple data of the service to be excavated according to first accounting and second accounting
Excavation project.
Sub-step S124, according to the contribution of multiple Data Mining Projects of the service to be excavated and the service to be excavated
Value, according to preset data dimension corresponding relationship, determines the corresponding data dimension to be excavated of each Data Mining Project.
Wherein, for sub-step S123, in one possible implementation, first according to first accounting and first
The second difference between the first difference and second accounting and the second setting value between setting value, determines high tribute respectively
Offer the first excavator factor of value tag and the second excavator factor of low contribution value tag.Then, according to first excavator factor
Determine that the first ratio of Data Mining Project corresponding to high contribution value tag and low contribution margin are special with second excavator factor
Second ratio of the corresponding Data Mining Project of sign, finally according to first ratio and second ratio-dependent to
Excavate multiple Data Mining Projects of service.
Based on above-mentioned steps, the present embodiment is by further considering high contribution value tag and low contribution value tag in each dimension
Accounting in the characteristic information of the clustering cluster of degree, so that it is determined that multiple Data Mining Projects of service to be excavated, compared to existing
Using fixing for mining data dimension in technology, the present embodiment can effectively improve data mining effect and accuracy, avoid
Excessive hash participates in data mining process.Further, it is possible to which the subjectivity of fixed mining data dimension selection is effectively reduced
Property influence, reduce excavate error rate.
In a kind of possible design, Fig. 4 is please referred to, for step S140, can specifically include following sub-step:
Sub-step S141, according to obtaining the corresponding business procedure number of data dimension to be excavated under each Data Mining Project
According to, obtain it is multiple first excavate service candidates.
It may include multiple excavations service candidate items in each first excavation service candidate in the present embodiment.In book
It, can be by being matched to business procedure data with each reference process data for excavating service candidate item, to obtain in step
Service candidate is excavated to multiple first.
Sub-step S142 services candidate item according to default excavation and retrieves table, excavates service candidate item the multiple first
Collection, which determines, has excavation identical with the default excavation service candidate item for including in the default excavation service candidate item retrieval table
Candidate item is serviced, and the target for excavating service candidate as the multiple first excavates service candidate item.
It may include that multiple default excavation services are candidate in the present embodiment, in the default excavation service candidate item retrieval table
Item, the first incidence relation that candidate item is serviced for identifying the default excavation of the every two with the first incidence relation identify and tool
There are each default frequent episode business-level for excavating service candidate item of the second incidence relation, first incidence relation and described
Second incidence relation is respectively used to the weak rigidity relationship between the strong incidence relation and frequent episode between characterization frequent episode.It is optional
Ground, above-mentioned strong incidence relation can refer to that this two default service candidate items of excavating are above-mentioned there are the association of business tandem
Weak rigidity relationship can refer to that this two default excavate service the associations that business tandem is not present in candidate item.
Sub-step S143, according to default the first incidence relation mark excavated service candidate item retrieval table and include, In
Presetting there are the second incidence relation is determined in each the first excavation service candidate that there is default excavation service candidate item
It excavates the second of service candidate item and excavates service candidate.
Sub-step S144 excavates service candidate for each second, according in the second excavation service candidate
Each default excavation service candidate item retrieve corresponding frequent episode business-level in table in default the excavations service candidate item,
Default excavates is selected to service candidate item as father's excavation service candidate item, other default excavation service candidate items as sub- digging
Pick service candidate item.
Sub-step S145 is obtained described wait dig according to father excavation service candidate item and the sub- excavation service candidate item
Dig the big data Result of service.
Based on above-mentioned steps, the present embodiment further contemplates the strong incidence relation between frequent episode and between frequent episode
Weak rigidity relationship, and retrieval excavation is carried out with this, can to avoid Mining Frequent item process always in the mistake association minings of data lead
The Result of cause deviates the case where data dimension to be excavated, to further increase excavation accuracy.
As an alternative embodiment, it is directed to sub-step S144, it can be according to the second excavation service candidate
In each default excavation service candidate item retrieve corresponding frequent episode service level in table in default the excavations service candidate item
Not, it selects frequent episode business-level to be greater than other default default excavation service candidate items for excavating service candidate item to excavate as father
Candidate item is serviced, and excavates service candidate item using other default excavation service candidate items as son.
As an alternative embodiment, sub-step S145 is directed to, in order to service candidate item for different excavations
It carries out that excavation is adaptively adjusted, is conducive to carry out reinforcing excavation for the lesser big data of data volume, improves digging efficiency, this reality
Apply example the father can be excavated service candidate item be added to it is specified excavate in item set, the specified excavation item set include with
The father excavates the Mining Strategy that service candidate item matches.Then, service candidate item phase is excavated with the father according to described
The Mining Strategy and son matched, which are excavated, services candidate item ratio, generates multiple mesh at random from multiple sub- excavation service candidate items
Mark, which excavates, services candidate item, and calculates every part of target in the multiple target excavation service candidate item and excavate service candidate
Item excavates the degree of correlation of service candidate item with the father.
On this basis, service candidate item further can be excavated according to every part of target being calculated and the father digs
It is candidate as more sub- excavation service to be excavated service candidate item by the degree of correlation of pick service candidate item for maximum target of the degree of correlation
, remaining target excavation service candidate item in service candidate item is excavated to the multiple target and is selected, is selected
Target after selecting excavates service candidate item set.Then, to after the selection target excavate service candidate item set into
Row intersects and the genetic manipulation of variation, obtains new target and excavates service candidate item set.Then, new target is calculated to dig
Each target excavates the degree of correlation of service candidate item in pick service candidate item set, and it is candidate to excavate service according to new target
Each target excavates the degree of correlation for servicing candidate item in item set and the sub- excavation of the comparison services the degree of correlation of candidate item, sentences
Target for breaking new excavates whether service candidate item set meets preset condition, if satisfied, then being exported according to the Mining Strategy
The service to be excavated corresponding with new the target excavation service candidate item set and father excavation service candidate item
Big data Result.
As an alternative embodiment, in step S150, it is described to judge what whether each data segment needed to encrypt
Step includes:
Whether S151 surpasses according to the accounting in the characteristic information of clustering cluster of the high contribution value tag in each data segment
First threshold is crossed, is, judges that the data segment needs to encrypt, does not otherwise encrypt.
In step S151, it is by the high accounting for contributing value tag in the characteristic information of the clustering cluster of each data segment
No is more than that first threshold improves efficiency as the foundation for whether needing to encrypt so as to reduce calculation amount.Certainly, may be used
To use other judgment modes, for example, the keyword rule that setting is additional, calculates each file by the keyword rule
The degree of need for confidentiality whether be more than given threshold, to judge whether to need to encrypt.
On the basis of foregoing description, big data processing server 100 can be dug the big data of the service to be excavated
Pick result is sent to each corresponding service server 200.
Fig. 5 is provided by the embodiments of the present application for executing the big data processing server of above-mentioned big data processing method
100 structural schematic diagram, as shown in figure 5, the big data processing server 100 may include network interface 110, machine readable storage
Medium 120, processor 130 and bus 140.The quantity of processor 130 can be one or more, be handled in Fig. 5 with one
For device 130;Network interface 110, machine readable storage medium 120 and processor 130 can pass through bus 140 or its other party
Formula connects, in Fig. 5 for being connected by bus 140.
Machine readable storage medium 120 is used as a kind of computer readable storage medium, can be used for storing software program, calculates
Machine executable program and module, such as the corresponding journey of method for establishing robot automatic question answering knowledge base in the embodiment of the present application
Sequence instruction/module.Software program, instruction and the mould that processor 130 is stored in machine readable storage medium 120 by operation
Block realizes above-mentioned big data processing method, herein thereby executing the various function application and data processing of terminal device
It repeats no more.
Machine readable storage medium 120 can mainly include storing program area and storage data area, wherein storing program area can
Application program needed for storage program area, at least one function;Storage data area can be stored to be created according to using for terminal
Data etc..In addition, machine readable storage medium 120 can be volatile memory or nonvolatile memory, or may include
Both volatile and non-volatile memories.Wherein, nonvolatile memory can be read-only memory (Read-Only
Memory, ROM), programmable read only memory (Programmable ROM, PROM), Erasable Programmable Read Only Memory EPROM
(Erasable PROM, EPROM), electrically erasable programmable read-only memory (Electrically EPROM, EEPROM) dodge
It deposits.Volatile memory can be random access memory (Random Access Memory, RAM), be used as external high speed
Caching.By exemplary but be not restricted explanation, the RAM of many forms is available, such as static random access memory
(Static RAM, SRAM), dynamic random access memory (Dynamic RAM, DRAM), Synchronous Dynamic Random Access Memory
(Synchronous DRAM, SDRAM), double data speed synchronous dynamic RAM (Double Data
RateSDRAM, DDR SDRAM), it is enhanced Synchronous Dynamic Random Access Memory (Enhanced SDRAM, ESDRAM), synchronous
Connect dynamic random access memory (Synchlink DRAM, SLDRAM) and direct rambus random access memory
(DirectRambus RAM, DR RAM).It should be noted that the memory of system and method described herein is intended to include but unlimited
In the memory of these and any other suitable type.In some instances, machine readable storage medium 120 can be wrapped further
The memory remotely located relative to processor 130 is included, these remote memories can pass through network connection to terminal device.On
The example for stating network includes but is not limited to internet, intranet, local area network, mobile radio communication and combinations thereof.
Processor 130 may be a kind of IC chip, the processing capacity with signal.It is above-mentioned during realization
Each step of embodiment of the method can be complete by the integrated logic circuit of the hardware in processor 130 or the instruction of software form
At.Above-mentioned processor 130 can be general processor, digital signal processor (Digital
SignalProcessorDSP), specific integrated circuit (Application Specific Integrated Circuit,
ASIC), ready-made programmable gate array (Field Programmable Gate Array, FPGA) or other programmable logic
Device, discrete gate or transistor logic, discrete hardware components.It may be implemented or execute the public affairs in the embodiment of the present application
Each method, step and the logic diagram opened.General processor can be microprocessor or the processor be also possible to it is any often
The processor etc. of rule.The step of method in conjunction with disclosed in the embodiment of the present application, can be embodied directly in hardware decoding processor and hold
Row complete, or in decoding processor hardware and software module combine execute completion.
Big data processing server 100 can by communication interface 110 and other equipment (such as service server 200) into
Row information interaction.Communication interface 110 can be circuit, bus, transceiver or other arbitrarily can be used for carrying out information exchange
Device.Processor 130 can use communication interface 110 and receive and send messages.
In the above-described embodiments, can come wholly or partly by software, hardware, firmware or any combination thereof real
It is existing.When implemented in software, it can entirely or partly realize in the form of a computer program product.The computer program
Product includes one or more computer instructions.When loading on computers and executing the computer program instructions, all or
It partly generates according to process or function described in the embodiment of the present application.The computer can be general purpose computer, dedicated meter
Calculation machine, computer network or other programmable devices.The computer instruction can store in computer readable storage medium
In, or from a computer readable storage medium to the transmission of another computer readable storage medium, for example, the computer
Instruction can pass through wired (such as coaxial cable, optical fiber, number from a web-site, computer, server or data center
User's line (DSL)) or wireless (such as infrared, wireless, microwave etc.) mode to another web-site, computer, server or
Data center is transmitted.The computer readable storage medium can be any usable medium that computer can access or
It is comprising data storage devices such as one or more usable mediums integrated server, data centers.The usable medium can be with
It is magnetic medium (for example, floppy disk, hard disk, tape), optical medium (for example, DVD) or semiconductor medium (such as solid state hard disk
(solid state disk, SSD)) etc..
The embodiment of the present application is referring to the method, equipment (system) and computer program product according to the embodiment of the present application
Flowchart and/or the block diagram describe.It should be understood that can be realized by computer program instructions in flowchart and/or the block diagram
The combination of process and/or box in each flow and/or block and flowchart and/or the block diagram.It can provide these calculating
Machine program instruction to general purpose computer, dedicated meter machine, Embedded Processor or other programmable data processing devices processor
To generate a machine, so that generating use by the instruction that computer or the processor of other programmable data processing devices execute
In the dress for realizing the function of specifying in one or more flows of the flowchart and/or one or more blocks of the block diagram
It sets.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy
Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates,
Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or
The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting
Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or
The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one
The step of function of being specified in a box or multiple boxes.
Obviously, those skilled in the art can carry out various modification and variations without departing from this Shen to the embodiment of the present application
Spirit and scope please.In this way, if these modifications and variations of the embodiment of the present application belong to the claim of this application and its wait
Within the scope of technology, then the application is also intended to include these modifications and variations.
Claims (7)
1. a kind of big data processing method based on cloud platform big data, which is characterized in that it is right with service institute to be excavated to be applied to
The big data processing server for each service server communication connection answered, which comprises
The business big data of multiple dimensions is obtained from each service server, and is directed to each dimension, is owned to the dimension
Business big data is clustered, and the clustering cluster of each dimension is obtained;
Extract the characteristic information of the clustering cluster of each dimension, and the characteristic information of the clustering cluster according to each dimension, determine described in
The corresponding data dimension to be excavated of multiple Data Mining Projects and each Data Mining Project of service to be excavated;
According to multiple Data Mining Projects of the service to be excavated and the corresponding data to be excavated of each Data Mining Project
Dimension obtains the corresponding business procedure data of data dimension to be excavated under each Data Mining Project respectively;
According to the corresponding business procedure data of data dimension to be excavated are obtained under each Data Mining Project, obtain described wait dig
Dig the big data Result of service;
The big data Result of the service to be excavated is divided into multiple data segments, judges whether each data segment needs to add
It is close, the encryption data that encryption generates random key is carried out to the data segment that needs encrypt, is finally sent to total data section respectively
A corresponding service server.
2. big data processing method according to claim 1, which is characterized in that the basis is in each Data Mining Project
It is lower to obtain the corresponding business procedure data of data dimension to be excavated, obtain the step of the big data Result of the service to be excavated
Suddenly, comprising:
According to the corresponding business procedure data of data dimension to be excavated are obtained under each Data Mining Project, multiple first are obtained
Service candidate is excavated, includes that multiple excavations service candidate item in each first excavation service candidate;
According to default excavation service candidate item retrieve table, it is the multiple first excavate service candidate determine exist with it is described
It is default to excavate the identical excavation service candidate item of default excavation service candidate item for including in service candidate item retrieval table, and conduct
The multiple first target for excavating service candidate excavates service candidate item, wherein the default excavation services candidate item
It include that multiple default excavate service candidate item, have the every two of the first incidence relation is default to excavate clothes for identifying in retrieval table
First incidence relation of candidate item of being engaged in identifies and each default excavate with the second incidence relation services the frequent of candidate item
The strong association that item business-level, first incidence relation and second incidence relation are respectively used between characterization frequent episode is closed
Weak rigidity relationship between system and frequent episode;
According to default the first incidence relation mark excavated service candidate item retrieval table and include, there is default excavate each
It services in the first excavation service candidate of candidate item and determines that there are the default excavation of the second incidence relation service candidate items
Second excavates service candidate;
Service candidate is excavated for each second, is taken according to each default excavation in the second excavation service candidate
Candidate item of being engaged in corresponding frequent episode business-level in the default excavation service candidate item retrieval table, selects one to preset and excavates
Candidate item is serviced as father and excavates service candidate item, other default excavation service candidate items are excavated as son and service candidate item;
Service candidate item is excavated according to the father and the sub- excavation service candidate item obtains the big data of the service to be excavated
Result.
3. big data processing method according to claim 2, which is characterized in that described candidate according to the second excavation service
Each default excavation service candidate item in item collection retrieves corresponding frequent episode industry in table in the default excavation service candidate item
Business rank selects a default excavation service candidate item to excavate as father and services candidate item, other default excavation service candidate items
The step of servicing candidate item as sub- excavation, comprising:
It is waited according to each default excavation service candidate item in the second excavation service candidate in the default excavation service
Option retrieves corresponding frequent episode business-level in table, selects frequent episode business-level to be greater than other default excavations and services candidate item
Default excavation service candidate item excavate service candidate item as father, and other default excavation are serviced into candidate items as sub- excavation
Service candidate item.
4. big data processing method according to claim 1, which is characterized in that described to excavate service candidate according to the father
The step of item and the sub- excavation service candidate item obtain the big data Result of the service to be excavated, comprising:
By the father excavate service candidate item be added to it is specified excavate in item set, the specified excavation item set include with it is described
Father excavates the Mining Strategy that service candidate item matches;
Mining Strategy that service candidate item matches is excavated with the father and son excavates and services candidate item ratio according to described, from more
Multiple target are generated at random in a sub- excavation service candidate item excavates service candidate item;
Every part of target excavation service candidate item and father excavation in the multiple target excavation service candidate item is calculated to take
The degree of correlation for candidate item of being engaged in;
Service candidate item is excavated according to every part of target being calculated and the father excavates the degree of correlation of service candidate item, by phase
Maximum target of Guan Du excavates service candidate item as more sub- excavation and services candidate item, excavates and takes to the multiple target
Remaining target excavates service candidate item and is selected in business candidate item, and target after being selected excavates service candidate item
Set;
The genetic manipulation that service candidate item set is intersected and made a variation is excavated to target after the selection, obtains new mesh
Mark excavates service candidate item set;
It calculates new target and excavates the degree of correlation that each target in service candidate item set excavates service candidate item, according to new
Target excavate the degree of correlation and the sub- excavation of the comparison that each target in service candidate item set excavates service candidate item
The degree of correlation for servicing candidate item judges that new target excavates whether service candidate item set meets preset condition, if satisfied, then
Service candidate item set is excavated with new target according to Mining Strategy output and the father excavates service candidate item
The big data Result of the corresponding service to be excavated.
5. big data processing method according to claim 1, which is characterized in that the clustering cluster according to each dimension
Characteristic information determines that multiple Data Mining Projects of the service to be excavated and each Data Mining Project are corresponding to be excavated
The step of data dimension, comprising:
Analysis obtains high contribution value tag and low contribution value tag from the characteristic information of the clustering cluster of each dimension;
Calculate the first accounting in the characteristic information of clustering cluster of the high contribution value tag in each dimension and described
The second accounting in the characteristic information of clustering cluster of the low contribution value tag in each dimension;
Multiple Data Mining Projects of the service to be excavated are determined according to first accounting and second accounting;
According to the contribution margin of multiple Data Mining Projects of the service to be excavated and the service to be excavated, according to preset number
According to dimension corresponding relationship, the corresponding data dimension to be excavated of each Data Mining Project is determined.
6. big data processing method according to claim 5, which is characterized in that described according to first accounting and described
Second accounting determines the step of multiple Data Mining Projects of the service to be excavated, comprising:
According between first accounting and the first setting value the first difference and second accounting and the second setting value it
Between the second difference, determine the first excavator factor of high contribution value tag and the second excavator factor of low contribution value tag respectively;
Determine that height contributes data mining item corresponding to value tag according to first excavator factor and second excavator factor
Second ratio of Data Mining Project corresponding to the first ratio of purpose and low contribution value tag;
Multiple Data Mining Projects of the service to be excavated according to first ratio and second ratio-dependent.
7. big data processing method according to claim 1, which is characterized in that described to judge whether each data segment needs
The step of encryption includes:
According to the accounting in the characteristic information of clustering cluster of the high contribution value tag in each data segment whether more than the first threshold
Value, is to judge that the data segment needs to encrypt, does not otherwise encrypt.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910728420.4A CN110442762B (en) | 2019-08-08 | 2019-08-08 | Big data processing method based on cloud platform big data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910728420.4A CN110442762B (en) | 2019-08-08 | 2019-08-08 | Big data processing method based on cloud platform big data |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110442762A true CN110442762A (en) | 2019-11-12 |
CN110442762B CN110442762B (en) | 2022-02-08 |
Family
ID=68433720
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910728420.4A Active CN110442762B (en) | 2019-08-08 | 2019-08-08 | Big data processing method based on cloud platform big data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110442762B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111159506A (en) * | 2019-12-26 | 2020-05-15 | 广州信天翁信息科技有限公司 | Data validity identification method, device and equipment and readable storage medium |
CN111258968A (en) * | 2019-12-30 | 2020-06-09 | 广州博士信息技术研究院有限公司 | Enterprise redundant data cleaning method and device and big data platform |
CN112163156A (en) * | 2020-10-06 | 2021-01-01 | 翁海坤 | Big data processing method based on artificial intelligence and cloud computing and cloud service center |
CN112163625A (en) * | 2020-10-06 | 2021-01-01 | 翁海坤 | Big data mining method based on artificial intelligence and cloud computing and cloud service center |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5794209A (en) * | 1995-03-31 | 1998-08-11 | International Business Machines Corporation | System and method for quickly mining association rules in databases |
US6324533B1 (en) * | 1998-05-29 | 2001-11-27 | International Business Machines Corporation | Integrated database and data-mining system |
CN105005570A (en) * | 2014-04-23 | 2015-10-28 | 国家电网公司 | Method and apparatus for mining massive intelligent power consumption data based on cloud computing |
CN107870990A (en) * | 2017-10-17 | 2018-04-03 | 北京德塔精要信息技术有限公司 | A kind of automobile recommends method and device |
CN108073701A (en) * | 2017-12-13 | 2018-05-25 | 北京工业大学 | A kind of method of the rare pattern of Mining Multidimensional time series data |
-
2019
- 2019-08-08 CN CN201910728420.4A patent/CN110442762B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5794209A (en) * | 1995-03-31 | 1998-08-11 | International Business Machines Corporation | System and method for quickly mining association rules in databases |
US6324533B1 (en) * | 1998-05-29 | 2001-11-27 | International Business Machines Corporation | Integrated database and data-mining system |
CN105005570A (en) * | 2014-04-23 | 2015-10-28 | 国家电网公司 | Method and apparatus for mining massive intelligent power consumption data based on cloud computing |
CN107870990A (en) * | 2017-10-17 | 2018-04-03 | 北京德塔精要信息技术有限公司 | A kind of automobile recommends method and device |
CN108073701A (en) * | 2017-12-13 | 2018-05-25 | 北京工业大学 | A kind of method of the rare pattern of Mining Multidimensional time series data |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111159506A (en) * | 2019-12-26 | 2020-05-15 | 广州信天翁信息科技有限公司 | Data validity identification method, device and equipment and readable storage medium |
CN111159506B (en) * | 2019-12-26 | 2023-11-14 | 广州信天翁信息科技有限公司 | Data validity identification method, device, equipment and readable storage medium |
CN111258968A (en) * | 2019-12-30 | 2020-06-09 | 广州博士信息技术研究院有限公司 | Enterprise redundant data cleaning method and device and big data platform |
CN112163156A (en) * | 2020-10-06 | 2021-01-01 | 翁海坤 | Big data processing method based on artificial intelligence and cloud computing and cloud service center |
CN112163625A (en) * | 2020-10-06 | 2021-01-01 | 翁海坤 | Big data mining method based on artificial intelligence and cloud computing and cloud service center |
CN113537271A (en) * | 2020-10-06 | 2021-10-22 | 翁海坤 | Big data mining method and system based on artificial intelligence and cloud service center |
CN113536107A (en) * | 2020-10-06 | 2021-10-22 | 翁海坤 | Big data decision method and system based on block chain and cloud service center |
CN113536107B (en) * | 2020-10-06 | 2022-07-29 | 西安创业天下网络科技有限公司 | Big data decision method and system based on block chain and cloud service center |
CN113537271B (en) * | 2020-10-06 | 2022-09-27 | 思玛特健康科技(苏州)有限公司 | Big data mining method and system based on artificial intelligence and cloud service center |
Also Published As
Publication number | Publication date |
---|---|
CN110442762B (en) | 2022-02-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110442762A (en) | Big data processing method based on cloud platform big data | |
US11880494B2 (en) | Secure decentralized system utilizing smart contracts, a blockchain, and/or a distributed file system | |
EP3591510B1 (en) | Method and device for writing service data in block chain system | |
US20200374133A1 (en) | Blockchain generation method and system, and related device | |
US10885020B1 (en) | Splitting incorrectly resolved entities using minimum cut | |
CN108733507A (en) | The method and apparatus of file backup and recovery | |
CN113438219B (en) | Playback transaction identification method and device based on blockchain all-in-one machine | |
CN111541783B (en) | Transaction forwarding method and device based on block chain all-in-one machine | |
CN110442623A (en) | Big data method for digging, device and data mining server | |
CN111539829B (en) | To-be-filtered transaction identification method and device based on block chain all-in-one machine | |
CN111541784A (en) | Transaction processing method and device based on block chain all-in-one machine | |
CN111541789A (en) | Data synchronization method and device based on block chain all-in-one machine | |
WO2023071105A1 (en) | Method and apparatus for analyzing feature variable, computer device, and storage medium | |
WO2024109454A1 (en) | Label propagation method and apparatus for associated network, and computer readable storage medium | |
KR20230010695A (en) | Differentiated private frequency deduplication | |
CN110674182A (en) | Big data analysis method and data analysis server | |
CN109213801A (en) | Data digging method and device based on incidence relation | |
CN111448551A (en) | Method and system for tracking application activity data from a remote device and generating corrective action data structures for the remote device | |
US11652835B1 (en) | Methods for security and privacy-enforced affinity scoring and devices thereof | |
WO2021098150A1 (en) | Receipt data encryption method and apparatus, electronic device, and storage medium | |
US11550953B2 (en) | Preserving cloud anonymity | |
US20240127332A1 (en) | Secure Decentralized System and Method | |
US20220166778A1 (en) | Application whitelisting based on file handling history | |
US11722518B2 (en) | System for providing enhanced cryptography based response mechanism for malicious attacks | |
CN107707620A (en) | Handle the method and device of I/O request |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |