CN110442762A - Big data processing method based on cloud platform big data - Google Patents

Big data processing method based on cloud platform big data Download PDF

Info

Publication number
CN110442762A
CN110442762A CN201910728420.4A CN201910728420A CN110442762A CN 110442762 A CN110442762 A CN 110442762A CN 201910728420 A CN201910728420 A CN 201910728420A CN 110442762 A CN110442762 A CN 110442762A
Authority
CN
China
Prior art keywords
service
candidate item
excavated
data
excavation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910728420.4A
Other languages
Chinese (zh)
Other versions
CN110442762B (en
Inventor
陈泉鑫
罗茂锐
陈少海
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen Jiu Ling Creative Technology Ltd
Original Assignee
Xiamen Jiu Ling Creative Technology Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen Jiu Ling Creative Technology Ltd filed Critical Xiamen Jiu Ling Creative Technology Ltd
Priority to CN201910728420.4A priority Critical patent/CN110442762B/en
Publication of CN110442762A publication Critical patent/CN110442762A/en
Application granted granted Critical
Publication of CN110442762B publication Critical patent/CN110442762B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the present application provides a kind of big data processing method, and data processing server significantly, by comprehensively considering different interestingness measure dimensions, after being clustered with all business big datas to each dimension, it can guarantee that different interestingness measure dimensions performance under different Data Mining Project scenes to be applied is more unified, improve the ability that big data is excavated, and it can be according to the characteristic information of each clustering cluster, it is dynamically determined the multiple Data Mining Projects and the corresponding data dimension to be excavated of each Data Mining Project of service to be excavated, and follow-up data excavation is carried out with this, it is poor mining effect can be likely to occur using fixed mining data dimension to avoid in the prior art, or data mining results have that accuracy is not high.

Description

Big data processing method based on cloud platform big data
Technical field
This application involves big data technical fields, at a kind of big data based on cloud platform big data Reason method.
Background technique
At present for big datas excavation sides such as business (such as order behavior business, browsing behavior business etc.) on each line Case, only limit uses a kind of interestingness measure dimension mostly, although the attribute of different interestingness measure dimension modes is focused in part With the research of behavior, but it is directed to some service to be excavated, different interestingness measure dimensions wait in different Data Mining Projects Under application scenarios, performance is different, and the ability excavated in big data is limited using limitation.Also, in entire data It in mining process, is excavated using fixed mining data dimension mostly.However, when mining data dimension is excessive, these Fixed mining data dimension may be unable to reach preferable mining effect, or when fixed mining data dimension is less, these Fixed mining data dimension may be more than actually required, on the one hand can waste computing resource, on the other hand also will increase data The Result probability not high there are accuracy.
Summary of the invention
In order at least overcome above-mentioned deficiency in the prior art, the first purpose of the application is to provide at a kind of big data Reason method, by comprehensively considering different interestingness measure dimensions, after being clustered with all business big datas to each dimension, It can guarantee that different interestingness measure dimensions performance under different Data Mining Project scenes to be applied is more unified, mention The ability of tall and big data mining, and multiple numbers of service to be excavated can be dynamically determined according to the characteristic information of each clustering cluster According to the project of excavation and the corresponding data dimension to be excavated of each Data Mining Project, and follow-up data excavation is carried out with this, it can To avoid using fixed mining data dimension to be likely to occur in the prior art, mining effect is poor or data mining results exist The not high problem of accuracy.
In a first aspect, the application provides a kind of big data processing method, applied to it is each corresponding to service to be excavated The big data processing server of service server communication connection, which comprises
The business big data of multiple dimensions is obtained from each service server, and is directed to each dimension, to the dimension All business big datas are clustered, and the clustering cluster of each dimension is obtained;
The characteristic information of the clustering cluster of each dimension, and the characteristic information of the clustering cluster according to each dimension are extracted, is determined The corresponding data dimension to be excavated of multiple Data Mining Projects and each Data Mining Project of the service to be excavated;
It is corresponding to be excavated according to multiple Data Mining Projects of the service to be excavated and each Data Mining Project Data dimension obtains the corresponding business procedure data of data dimension to be excavated under each Data Mining Project respectively;
According to the corresponding business procedure data of data dimension to be excavated are obtained under each Data Mining Project, obtain described The big data Result of service to be excavated;
The big data Result of the service to be excavated is divided into multiple data segments, judges whether each data segment needs It encrypts, the encryption data that encryption generates random key is carried out to the data segment that needs encrypt, finally sends total data section To each corresponding service server.
In a kind of possible design of first aspect, the basis obtains number to be excavated under each Data Mining Project According to the corresponding business procedure data of dimension, the step of obtaining the big data Result of the service to be excavated, comprising:
According to the corresponding business procedure data of data dimension to be excavated are obtained under each Data Mining Project, obtain multiple First excavates service candidate, includes that multiple excavations service candidate item in each first excavation service candidate;
According to default excavation service candidate item retrieve table, it is the multiple first excavate service candidate determine exist with The identical excavation service candidate item of default excavation service candidate item for including in the default excavation service candidate item retrieval table, and The target for excavating service candidate as the multiple first excavates service candidate item, wherein the default excavation service is waited It include that multiple default excavate service candidate item, have the every two of the first incidence relation is default to dig for identifying in option retrieval table First incidence relation of pick service candidate item identifies and each default excavate with the second incidence relation services candidate item Frequent episode business-level, first incidence relation and second incidence relation are respectively used to the strong pass between characterization frequent episode Weak rigidity relationship between connection relationship and frequent episode;
According to default the first incidence relation mark excavated service candidate item retrieval table and include, each in the presence of default It excavates in the first excavation service candidate of service candidate item and determines that there are the default excavation of the second incidence relation service is candidate The second of item excavates service candidate;
Service candidate is excavated for each second, according to each default digging in the second excavation service candidate Pick service candidate item corresponding frequent episode business-level in the default excavation service candidate item retrieval table, selection one default Service candidate item is excavated as father and excavates service candidate item, other default excavations service candidate item as son and excavate service candidate ;
Service candidate item is excavated according to the father and the sub- excavation service candidate item obtains the big of the service to be excavated Data mining results.
It is described according to each pre- in the second excavation service candidate in a kind of possible design of first aspect If excavating service candidate item corresponding frequent episode business-level in the default excavation service candidate item retrieval table, one is selected Default excavation service candidate item excavates service candidate item as father, other default excavation service candidate items are used as the son service of excavating to wait The step of option, comprising:
It is taken according to each default excavation service candidate item in the second excavation service candidate in the default excavation Corresponding frequent episode business-level in candidate item of being engaged in retrieval table selects frequent episode business-level to be greater than other default excavation services and waits The default excavation service candidate item of option excavates service candidate item as father, and using other default excavation service candidate items as son Excavate service candidate item.
It is described that service candidate item and the sub- excavation are excavated according to the father in a kind of possible design of first aspect The step of service candidate item obtains the big data Result of the service to be excavated, comprising:
By the father excavate service candidate item be added to it is specified excavate in item set, the specified excavation item set include with The father excavates the Mining Strategy that service candidate item matches;
The Mining Strategy and son excavation service candidate item ratio that service candidate item matches are excavated with the father according to described, Multiple target are generated at random from multiple sub- excavation service candidate items excavates service candidate item;
It calculates every part of target in the multiple target excavation service candidate item and excavates service candidate item and father digging The degree of correlation of pick service candidate item;
Service candidate item is excavated according to every part of target being calculated and the father excavates the degree of correlation of service candidate item, Maximum target of the degree of correlation is excavated into service candidate item as more sub- excavation and services candidate item, the multiple target is dug Remaining target excavates service candidate item and is selected in pick service candidate item, and the service of excavating of target after being selected is waited Option set;
The genetic manipulation that service candidate item set is intersected and made a variation is excavated to target after the selection, is obtained new Target excavate service candidate item set;
It calculates new target and excavates the degree of correlation that each target in service candidate item set excavates service candidate item, root It is sub that the degree of correlation of each target excavation service candidate item and the comparison in service candidate item set are excavated according to new target The degree of correlation for excavating service candidate item judges that new target excavates whether service candidate item set meets preset condition, if full Foot then excavates service candidate item set with new target according to Mining Strategy output and the father service of excavating is waited The big data Result of the corresponding service to be excavated of option.
In a kind of possible design of first aspect, the characteristic information of the clustering cluster according to each dimension is determined The step of the corresponding data dimension to be excavated of multiple Data Mining Projects and each Data Mining Project of the service to be excavated Suddenly, comprising:
Analysis obtains high contribution value tag and low contribution value tag from the characteristic information of the clustering cluster of each dimension;
Calculate the first accounting in the characteristic information of clustering cluster of the high contribution value tag in each dimension and The second accounting in the characteristic information of clustering cluster of the low contribution value tag in each dimension;
Multiple Data Mining Projects of the service to be excavated are determined according to first accounting and second accounting;
According to the contribution margin of multiple Data Mining Projects of the service to be excavated and the service to be excavated, according to default Data dimension corresponding relationship, determine the corresponding data dimension to be excavated of each Data Mining Project.
It is described that institute is determined according to first accounting and second accounting in a kind of possible design of first aspect The step of stating multiple Data Mining Projects of service to be excavated, comprising:
According to the first difference and second accounting and the second setting between first accounting and the first setting value The second difference between value determines that the first excavator factor of high contribution value tag and the second of low contribution value tag excavate system respectively Number;
Determine that height contributes data corresponding to value tag to dig according to first excavator factor and second excavator factor Second ratio of Data Mining Project corresponding to the first ratio of pick project and low contribution value tag;
Multiple Data Mining Projects of the service to be excavated according to first ratio and second ratio-dependent.
It is described to judge whether each data segment needs the step of encrypting packet in a kind of possible design of first aspect It includes:
According to the accounting in the characteristic information of clustering cluster of the high contribution value tag in each data segment whether more than the One threshold value is to judge that the data segment needs to encrypt, does not otherwise encrypt.
Second aspect, the embodiment of the present application provide a kind of big data processing server, including processor, memory and network Interface.Wherein, can be connected by bus system between memory, network interface processor.Network interface is used to receive message, Memory is used to execute program, instruction or the code in memory for storing program, instruction or code, processor, to complete Performed operation in any possible design method of above-mentioned first aspect or first aspect.
The third aspect, the embodiment of the present application provide a kind of computer readable storage medium, in computer readable storage medium It is stored with instruction, when run on a computer, so that computer executes above-mentioned first aspect or any of first aspect can Method in the design method of energy.
Based on any one above-mentioned aspect, the application is by comprehensively considering different interestingness measure dimensions, to each dimension After all business big datas of degree are clustered, it is ensured that different interestingness measure dimensions are waited in different Data Mining Projects Performance is more unified under application scenarios, improves the ability that big data is excavated, and can believe according to the feature of each clustering cluster Breath, is dynamically determined the multiple Data Mining Projects and the corresponding data dimension to be excavated of each Data Mining Project of service to be excavated Degree, and follow-up data excavation is carried out with this, excavation can be likely to occur to avoid the fixed mining data dimension of use in the prior art Effect is poor or data mining results have that accuracy is not high.In addition, by the way that the big data Result is divided Multiple data segments are segmented into, so as to avoid huge workload, and can protect important data.
Detailed description of the invention
Technical solution in ord to more clearly illustrate embodiments of the present application, below will be to needed in the embodiment attached Figure is briefly described, it should be understood that the following drawings illustrates only some embodiments of the application, therefore is not construed as pair The restriction of range for those of ordinary skill in the art without creative efforts, can also be according to this A little attached drawings obtain other relevant attached drawings.
Fig. 1 is the application scenarios schematic diagram of big data processing method provided by the embodiments of the present application;
Fig. 2 is the flow diagram of big data processing method provided by the embodiments of the present application;
Fig. 3 is that the process for each sub-steps that step S120 includes in a kind of possible embodiment shown in Fig. 2 is shown It is intended to;
Fig. 4 is that the process for each sub-steps that step S140 includes in a kind of possible embodiment shown in Fig. 2 is shown It is intended to;
Fig. 5 is provided by the embodiments of the present application for executing the structural representation frame of the big data processing server of the above method Figure.
Specific embodiment
The application is specifically described with reference to the accompanying drawings of the specification, the concrete operation method in embodiment of the method can also To be applied in Installation practice or system embodiment.In the description of the present application, unless otherwise indicated, "at least one" includes It is one or more." multiple " refer to two or more.For example, at least one of A, B and C, comprising: individualism A, list Solely there are B, exist simultaneously A and B, exist simultaneously A and C, exist simultaneously B and C, and exist simultaneously A, B and C.In this application, "/" indicate or the meaning, for example, A/B can indicate A or B;"and/or" herein is only a kind of description affiliated partner Incidence relation indicates may exist three kinds of relationships, for example, A and/or B, can indicate: individualism A exists simultaneously A and B, These three situations of individualism B.
Referring to Fig. 1, being the application scenarios schematic diagram of big data processing method provided by the embodiments of the present application.The present embodiment In, which may include big data processing server 100 and communicates to connect with the big data processing server 100 more A service server 200.Wherein, big data processing server 100 can provide data mining for multiple service servers 200 Service.Each service server 200, which can be, is individually performed business on each line, such as order business, transaction business etc..
Fig. 2 is the flow diagram of big data processing method provided by the embodiments of the present application.In the present embodiment, the big data Processing method can the big data processing server 100 as shown in Fig. 1 execute, the big data processing method is carried out below detailed It is thin to introduce.
Step S110 obtains the business big data of multiple dimensions from each service server 200, and is directed to each dimension Degree, clusters all business big datas of the dimension, obtains the clustering cluster of each dimension.
In the present embodiment, service to be excavated be can be according to the actually determined excavation service of user demand, specifically can root Service server 200 associated there is selected according to the setting of user, then obtains multiple dimensions from these service servers 200 The business big data of degree, and it is directed to each dimension, all business big datas of the dimension are clustered, to obtain each dimension The clustering cluster of degree.
Step S120 extracts the characteristic information of the clustering cluster of each dimension, and the feature of the clustering cluster according to each dimension Information determines multiple Data Mining Projects of the service to be excavated and the corresponding data to be excavated of each Data Mining Project Dimension.
In the present embodiment, for example, windowing process can be carried out by the clustering cluster first to each dimension;It will be in window The clustering cluster of each dimension is input to the characteristic information that the clustering cluster of each dimension is calculated in CCIPCA algorithm.
Step S130, it is corresponding according to multiple Data Mining Projects of the service to be excavated and each Data Mining Project Data dimension to be excavated, the corresponding business procedure number of data dimension to be excavated is obtained under each Data Mining Project respectively According to.
In the present embodiment, business procedure data can include but is not limited to the business historical data under data dimension to be excavated The historical data currently generated in real time.
Step S140, according to obtaining the corresponding business procedure number of data dimension to be excavated under each Data Mining Project According to obtaining the big data Result of the service to be excavated.
The big data Result of the service to be excavated is divided into multiple data segments, judges each number by step S150 Whether need to encrypt according to section, the encryption data that encryption generates random key is carried out to the data segment that needs encrypt, it finally will be whole Data segment is sent to each corresponding service server.
The data segment can be stored in a manner of file, that is, each file regards a data segment.In other embodiments In, it can also be stored in same file with multiple data segments.In addition, the mode of the encryption is also unlimited, can be used existing Encryption method.
Based on above-mentioned steps, the present embodiment is by comprehensively considering different interestingness measure dimensions, with the institute to each dimension After thering is business big data to be clustered, it is ensured that different interestingness measure dimensions are in different Data Mining Project fields to be applied Performance is more unified under scape, improves the ability that big data is excavated, and can be according to the characteristic information of each clustering cluster, dynamic Determine the multiple Data Mining Projects and the corresponding data dimension to be excavated of each Data Mining Project of service to be excavated, and with This carries out follow-up data excavation, can to avoid in the prior art using fixed mining data dimension be likely to occur mining effect compared with Difference or data mining results have that accuracy is not high.In addition, more by the way that the big data Result to be divided into A data segment, the huge workload generated so as to avoid encrypting total data, however can protect again important Data.In a kind of possible design, Fig. 3 is please referred to, for step S120, can specifically include following sub-step:
Sub-step S121, from the characteristic information of the clustering cluster of each dimension analysis obtain high contribution value tag with it is low Contribute value tag.
Sub-step S122 calculates the in the characteristic information of clustering cluster of the high contribution value tag in each dimension The second accounting in the characteristic information of the clustering cluster of one accounting and the low contribution value tag in each dimension.
Sub-step S123 determines multiple data of the service to be excavated according to first accounting and second accounting Excavation project.
Sub-step S124, according to the contribution of multiple Data Mining Projects of the service to be excavated and the service to be excavated Value, according to preset data dimension corresponding relationship, determines the corresponding data dimension to be excavated of each Data Mining Project.
Wherein, for sub-step S123, in one possible implementation, first according to first accounting and first The second difference between the first difference and second accounting and the second setting value between setting value, determines high tribute respectively Offer the first excavator factor of value tag and the second excavator factor of low contribution value tag.Then, according to first excavator factor Determine that the first ratio of Data Mining Project corresponding to high contribution value tag and low contribution margin are special with second excavator factor Second ratio of the corresponding Data Mining Project of sign, finally according to first ratio and second ratio-dependent to Excavate multiple Data Mining Projects of service.
Based on above-mentioned steps, the present embodiment is by further considering high contribution value tag and low contribution value tag in each dimension Accounting in the characteristic information of the clustering cluster of degree, so that it is determined that multiple Data Mining Projects of service to be excavated, compared to existing Using fixing for mining data dimension in technology, the present embodiment can effectively improve data mining effect and accuracy, avoid Excessive hash participates in data mining process.Further, it is possible to which the subjectivity of fixed mining data dimension selection is effectively reduced Property influence, reduce excavate error rate.
In a kind of possible design, Fig. 4 is please referred to, for step S140, can specifically include following sub-step:
Sub-step S141, according to obtaining the corresponding business procedure number of data dimension to be excavated under each Data Mining Project According to, obtain it is multiple first excavate service candidates.
It may include multiple excavations service candidate items in each first excavation service candidate in the present embodiment.In book It, can be by being matched to business procedure data with each reference process data for excavating service candidate item, to obtain in step Service candidate is excavated to multiple first.
Sub-step S142 services candidate item according to default excavation and retrieves table, excavates service candidate item the multiple first Collection, which determines, has excavation identical with the default excavation service candidate item for including in the default excavation service candidate item retrieval table Candidate item is serviced, and the target for excavating service candidate as the multiple first excavates service candidate item.
It may include that multiple default excavation services are candidate in the present embodiment, in the default excavation service candidate item retrieval table Item, the first incidence relation that candidate item is serviced for identifying the default excavation of the every two with the first incidence relation identify and tool There are each default frequent episode business-level for excavating service candidate item of the second incidence relation, first incidence relation and described Second incidence relation is respectively used to the weak rigidity relationship between the strong incidence relation and frequent episode between characterization frequent episode.It is optional Ground, above-mentioned strong incidence relation can refer to that this two default service candidate items of excavating are above-mentioned there are the association of business tandem Weak rigidity relationship can refer to that this two default excavate service the associations that business tandem is not present in candidate item.
Sub-step S143, according to default the first incidence relation mark excavated service candidate item retrieval table and include, In Presetting there are the second incidence relation is determined in each the first excavation service candidate that there is default excavation service candidate item It excavates the second of service candidate item and excavates service candidate.
Sub-step S144 excavates service candidate for each second, according in the second excavation service candidate Each default excavation service candidate item retrieve corresponding frequent episode business-level in table in default the excavations service candidate item, Default excavates is selected to service candidate item as father's excavation service candidate item, other default excavation service candidate items as sub- digging Pick service candidate item.
Sub-step S145 is obtained described wait dig according to father excavation service candidate item and the sub- excavation service candidate item Dig the big data Result of service.
Based on above-mentioned steps, the present embodiment further contemplates the strong incidence relation between frequent episode and between frequent episode Weak rigidity relationship, and retrieval excavation is carried out with this, can to avoid Mining Frequent item process always in the mistake association minings of data lead The Result of cause deviates the case where data dimension to be excavated, to further increase excavation accuracy.
As an alternative embodiment, it is directed to sub-step S144, it can be according to the second excavation service candidate In each default excavation service candidate item retrieve corresponding frequent episode service level in table in default the excavations service candidate item Not, it selects frequent episode business-level to be greater than other default default excavation service candidate items for excavating service candidate item to excavate as father Candidate item is serviced, and excavates service candidate item using other default excavation service candidate items as son.
As an alternative embodiment, sub-step S145 is directed to, in order to service candidate item for different excavations It carries out that excavation is adaptively adjusted, is conducive to carry out reinforcing excavation for the lesser big data of data volume, improves digging efficiency, this reality Apply example the father can be excavated service candidate item be added to it is specified excavate in item set, the specified excavation item set include with The father excavates the Mining Strategy that service candidate item matches.Then, service candidate item phase is excavated with the father according to described The Mining Strategy and son matched, which are excavated, services candidate item ratio, generates multiple mesh at random from multiple sub- excavation service candidate items Mark, which excavates, services candidate item, and calculates every part of target in the multiple target excavation service candidate item and excavate service candidate Item excavates the degree of correlation of service candidate item with the father.
On this basis, service candidate item further can be excavated according to every part of target being calculated and the father digs It is candidate as more sub- excavation service to be excavated service candidate item by the degree of correlation of pick service candidate item for maximum target of the degree of correlation , remaining target excavation service candidate item in service candidate item is excavated to the multiple target and is selected, is selected Target after selecting excavates service candidate item set.Then, to after the selection target excavate service candidate item set into Row intersects and the genetic manipulation of variation, obtains new target and excavates service candidate item set.Then, new target is calculated to dig Each target excavates the degree of correlation of service candidate item in pick service candidate item set, and it is candidate to excavate service according to new target Each target excavates the degree of correlation for servicing candidate item in item set and the sub- excavation of the comparison services the degree of correlation of candidate item, sentences Target for breaking new excavates whether service candidate item set meets preset condition, if satisfied, then being exported according to the Mining Strategy The service to be excavated corresponding with new the target excavation service candidate item set and father excavation service candidate item Big data Result.
As an alternative embodiment, in step S150, it is described to judge what whether each data segment needed to encrypt Step includes:
Whether S151 surpasses according to the accounting in the characteristic information of clustering cluster of the high contribution value tag in each data segment First threshold is crossed, is, judges that the data segment needs to encrypt, does not otherwise encrypt.
In step S151, it is by the high accounting for contributing value tag in the characteristic information of the clustering cluster of each data segment No is more than that first threshold improves efficiency as the foundation for whether needing to encrypt so as to reduce calculation amount.Certainly, may be used To use other judgment modes, for example, the keyword rule that setting is additional, calculates each file by the keyword rule The degree of need for confidentiality whether be more than given threshold, to judge whether to need to encrypt.
On the basis of foregoing description, big data processing server 100 can be dug the big data of the service to be excavated Pick result is sent to each corresponding service server 200.
Fig. 5 is provided by the embodiments of the present application for executing the big data processing server of above-mentioned big data processing method 100 structural schematic diagram, as shown in figure 5, the big data processing server 100 may include network interface 110, machine readable storage Medium 120, processor 130 and bus 140.The quantity of processor 130 can be one or more, be handled in Fig. 5 with one For device 130;Network interface 110, machine readable storage medium 120 and processor 130 can pass through bus 140 or its other party Formula connects, in Fig. 5 for being connected by bus 140.
Machine readable storage medium 120 is used as a kind of computer readable storage medium, can be used for storing software program, calculates Machine executable program and module, such as the corresponding journey of method for establishing robot automatic question answering knowledge base in the embodiment of the present application Sequence instruction/module.Software program, instruction and the mould that processor 130 is stored in machine readable storage medium 120 by operation Block realizes above-mentioned big data processing method, herein thereby executing the various function application and data processing of terminal device It repeats no more.
Machine readable storage medium 120 can mainly include storing program area and storage data area, wherein storing program area can Application program needed for storage program area, at least one function;Storage data area can be stored to be created according to using for terminal Data etc..In addition, machine readable storage medium 120 can be volatile memory or nonvolatile memory, or may include Both volatile and non-volatile memories.Wherein, nonvolatile memory can be read-only memory (Read-Only Memory, ROM), programmable read only memory (Programmable ROM, PROM), Erasable Programmable Read Only Memory EPROM (Erasable PROM, EPROM), electrically erasable programmable read-only memory (Electrically EPROM, EEPROM) dodge It deposits.Volatile memory can be random access memory (Random Access Memory, RAM), be used as external high speed Caching.By exemplary but be not restricted explanation, the RAM of many forms is available, such as static random access memory (Static RAM, SRAM), dynamic random access memory (Dynamic RAM, DRAM), Synchronous Dynamic Random Access Memory (Synchronous DRAM, SDRAM), double data speed synchronous dynamic RAM (Double Data RateSDRAM, DDR SDRAM), it is enhanced Synchronous Dynamic Random Access Memory (Enhanced SDRAM, ESDRAM), synchronous Connect dynamic random access memory (Synchlink DRAM, SLDRAM) and direct rambus random access memory (DirectRambus RAM, DR RAM).It should be noted that the memory of system and method described herein is intended to include but unlimited In the memory of these and any other suitable type.In some instances, machine readable storage medium 120 can be wrapped further The memory remotely located relative to processor 130 is included, these remote memories can pass through network connection to terminal device.On The example for stating network includes but is not limited to internet, intranet, local area network, mobile radio communication and combinations thereof.
Processor 130 may be a kind of IC chip, the processing capacity with signal.It is above-mentioned during realization Each step of embodiment of the method can be complete by the integrated logic circuit of the hardware in processor 130 or the instruction of software form At.Above-mentioned processor 130 can be general processor, digital signal processor (Digital SignalProcessorDSP), specific integrated circuit (Application Specific Integrated Circuit, ASIC), ready-made programmable gate array (Field Programmable Gate Array, FPGA) or other programmable logic Device, discrete gate or transistor logic, discrete hardware components.It may be implemented or execute the public affairs in the embodiment of the present application Each method, step and the logic diagram opened.General processor can be microprocessor or the processor be also possible to it is any often The processor etc. of rule.The step of method in conjunction with disclosed in the embodiment of the present application, can be embodied directly in hardware decoding processor and hold Row complete, or in decoding processor hardware and software module combine execute completion.
Big data processing server 100 can by communication interface 110 and other equipment (such as service server 200) into Row information interaction.Communication interface 110 can be circuit, bus, transceiver or other arbitrarily can be used for carrying out information exchange Device.Processor 130 can use communication interface 110 and receive and send messages.
In the above-described embodiments, can come wholly or partly by software, hardware, firmware or any combination thereof real It is existing.When implemented in software, it can entirely or partly realize in the form of a computer program product.The computer program Product includes one or more computer instructions.When loading on computers and executing the computer program instructions, all or It partly generates according to process or function described in the embodiment of the present application.The computer can be general purpose computer, dedicated meter Calculation machine, computer network or other programmable devices.The computer instruction can store in computer readable storage medium In, or from a computer readable storage medium to the transmission of another computer readable storage medium, for example, the computer Instruction can pass through wired (such as coaxial cable, optical fiber, number from a web-site, computer, server or data center User's line (DSL)) or wireless (such as infrared, wireless, microwave etc.) mode to another web-site, computer, server or Data center is transmitted.The computer readable storage medium can be any usable medium that computer can access or It is comprising data storage devices such as one or more usable mediums integrated server, data centers.The usable medium can be with It is magnetic medium (for example, floppy disk, hard disk, tape), optical medium (for example, DVD) or semiconductor medium (such as solid state hard disk (solid state disk, SSD)) etc..
The embodiment of the present application is referring to the method, equipment (system) and computer program product according to the embodiment of the present application Flowchart and/or the block diagram describe.It should be understood that can be realized by computer program instructions in flowchart and/or the block diagram The combination of process and/or box in each flow and/or block and flowchart and/or the block diagram.It can provide these calculating Machine program instruction to general purpose computer, dedicated meter machine, Embedded Processor or other programmable data processing devices processor To generate a machine, so that generating use by the instruction that computer or the processor of other programmable data processing devices execute In the dress for realizing the function of specifying in one or more flows of the flowchart and/or one or more blocks of the block diagram It sets.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates, Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one The step of function of being specified in a box or multiple boxes.
Obviously, those skilled in the art can carry out various modification and variations without departing from this Shen to the embodiment of the present application Spirit and scope please.In this way, if these modifications and variations of the embodiment of the present application belong to the claim of this application and its wait Within the scope of technology, then the application is also intended to include these modifications and variations.

Claims (7)

1. a kind of big data processing method based on cloud platform big data, which is characterized in that it is right with service institute to be excavated to be applied to The big data processing server for each service server communication connection answered, which comprises
The business big data of multiple dimensions is obtained from each service server, and is directed to each dimension, is owned to the dimension Business big data is clustered, and the clustering cluster of each dimension is obtained;
Extract the characteristic information of the clustering cluster of each dimension, and the characteristic information of the clustering cluster according to each dimension, determine described in The corresponding data dimension to be excavated of multiple Data Mining Projects and each Data Mining Project of service to be excavated;
According to multiple Data Mining Projects of the service to be excavated and the corresponding data to be excavated of each Data Mining Project Dimension obtains the corresponding business procedure data of data dimension to be excavated under each Data Mining Project respectively;
According to the corresponding business procedure data of data dimension to be excavated are obtained under each Data Mining Project, obtain described wait dig Dig the big data Result of service;
The big data Result of the service to be excavated is divided into multiple data segments, judges whether each data segment needs to add It is close, the encryption data that encryption generates random key is carried out to the data segment that needs encrypt, is finally sent to total data section respectively A corresponding service server.
2. big data processing method according to claim 1, which is characterized in that the basis is in each Data Mining Project It is lower to obtain the corresponding business procedure data of data dimension to be excavated, obtain the step of the big data Result of the service to be excavated Suddenly, comprising:
According to the corresponding business procedure data of data dimension to be excavated are obtained under each Data Mining Project, multiple first are obtained Service candidate is excavated, includes that multiple excavations service candidate item in each first excavation service candidate;
According to default excavation service candidate item retrieve table, it is the multiple first excavate service candidate determine exist with it is described It is default to excavate the identical excavation service candidate item of default excavation service candidate item for including in service candidate item retrieval table, and conduct The multiple first target for excavating service candidate excavates service candidate item, wherein the default excavation services candidate item It include that multiple default excavate service candidate item, have the every two of the first incidence relation is default to excavate clothes for identifying in retrieval table First incidence relation of candidate item of being engaged in identifies and each default excavate with the second incidence relation services the frequent of candidate item The strong association that item business-level, first incidence relation and second incidence relation are respectively used between characterization frequent episode is closed Weak rigidity relationship between system and frequent episode;
According to default the first incidence relation mark excavated service candidate item retrieval table and include, there is default excavate each It services in the first excavation service candidate of candidate item and determines that there are the default excavation of the second incidence relation service candidate items Second excavates service candidate;
Service candidate is excavated for each second, is taken according to each default excavation in the second excavation service candidate Candidate item of being engaged in corresponding frequent episode business-level in the default excavation service candidate item retrieval table, selects one to preset and excavates Candidate item is serviced as father and excavates service candidate item, other default excavation service candidate items are excavated as son and service candidate item;
Service candidate item is excavated according to the father and the sub- excavation service candidate item obtains the big data of the service to be excavated Result.
3. big data processing method according to claim 2, which is characterized in that described candidate according to the second excavation service Each default excavation service candidate item in item collection retrieves corresponding frequent episode industry in table in the default excavation service candidate item Business rank selects a default excavation service candidate item to excavate as father and services candidate item, other default excavation service candidate items The step of servicing candidate item as sub- excavation, comprising:
It is waited according to each default excavation service candidate item in the second excavation service candidate in the default excavation service Option retrieves corresponding frequent episode business-level in table, selects frequent episode business-level to be greater than other default excavations and services candidate item Default excavation service candidate item excavate service candidate item as father, and other default excavation are serviced into candidate items as sub- excavation Service candidate item.
4. big data processing method according to claim 1, which is characterized in that described to excavate service candidate according to the father The step of item and the sub- excavation service candidate item obtain the big data Result of the service to be excavated, comprising:
By the father excavate service candidate item be added to it is specified excavate in item set, the specified excavation item set include with it is described Father excavates the Mining Strategy that service candidate item matches;
Mining Strategy that service candidate item matches is excavated with the father and son excavates and services candidate item ratio according to described, from more Multiple target are generated at random in a sub- excavation service candidate item excavates service candidate item;
Every part of target excavation service candidate item and father excavation in the multiple target excavation service candidate item is calculated to take The degree of correlation for candidate item of being engaged in;
Service candidate item is excavated according to every part of target being calculated and the father excavates the degree of correlation of service candidate item, by phase Maximum target of Guan Du excavates service candidate item as more sub- excavation and services candidate item, excavates and takes to the multiple target Remaining target excavates service candidate item and is selected in business candidate item, and target after being selected excavates service candidate item Set;
The genetic manipulation that service candidate item set is intersected and made a variation is excavated to target after the selection, obtains new mesh Mark excavates service candidate item set;
It calculates new target and excavates the degree of correlation that each target in service candidate item set excavates service candidate item, according to new Target excavate the degree of correlation and the sub- excavation of the comparison that each target in service candidate item set excavates service candidate item The degree of correlation for servicing candidate item judges that new target excavates whether service candidate item set meets preset condition, if satisfied, then Service candidate item set is excavated with new target according to Mining Strategy output and the father excavates service candidate item The big data Result of the corresponding service to be excavated.
5. big data processing method according to claim 1, which is characterized in that the clustering cluster according to each dimension Characteristic information determines that multiple Data Mining Projects of the service to be excavated and each Data Mining Project are corresponding to be excavated The step of data dimension, comprising:
Analysis obtains high contribution value tag and low contribution value tag from the characteristic information of the clustering cluster of each dimension;
Calculate the first accounting in the characteristic information of clustering cluster of the high contribution value tag in each dimension and described The second accounting in the characteristic information of clustering cluster of the low contribution value tag in each dimension;
Multiple Data Mining Projects of the service to be excavated are determined according to first accounting and second accounting;
According to the contribution margin of multiple Data Mining Projects of the service to be excavated and the service to be excavated, according to preset number According to dimension corresponding relationship, the corresponding data dimension to be excavated of each Data Mining Project is determined.
6. big data processing method according to claim 5, which is characterized in that described according to first accounting and described Second accounting determines the step of multiple Data Mining Projects of the service to be excavated, comprising:
According between first accounting and the first setting value the first difference and second accounting and the second setting value it Between the second difference, determine the first excavator factor of high contribution value tag and the second excavator factor of low contribution value tag respectively;
Determine that height contributes data mining item corresponding to value tag according to first excavator factor and second excavator factor Second ratio of Data Mining Project corresponding to the first ratio of purpose and low contribution value tag;
Multiple Data Mining Projects of the service to be excavated according to first ratio and second ratio-dependent.
7. big data processing method according to claim 1, which is characterized in that described to judge whether each data segment needs The step of encryption includes:
According to the accounting in the characteristic information of clustering cluster of the high contribution value tag in each data segment whether more than the first threshold Value, is to judge that the data segment needs to encrypt, does not otherwise encrypt.
CN201910728420.4A 2019-08-08 2019-08-08 Big data processing method based on cloud platform big data Active CN110442762B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910728420.4A CN110442762B (en) 2019-08-08 2019-08-08 Big data processing method based on cloud platform big data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910728420.4A CN110442762B (en) 2019-08-08 2019-08-08 Big data processing method based on cloud platform big data

Publications (2)

Publication Number Publication Date
CN110442762A true CN110442762A (en) 2019-11-12
CN110442762B CN110442762B (en) 2022-02-08

Family

ID=68433720

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910728420.4A Active CN110442762B (en) 2019-08-08 2019-08-08 Big data processing method based on cloud platform big data

Country Status (1)

Country Link
CN (1) CN110442762B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111159506A (en) * 2019-12-26 2020-05-15 广州信天翁信息科技有限公司 Data validity identification method, device and equipment and readable storage medium
CN111258968A (en) * 2019-12-30 2020-06-09 广州博士信息技术研究院有限公司 Enterprise redundant data cleaning method and device and big data platform
CN112163156A (en) * 2020-10-06 2021-01-01 翁海坤 Big data processing method based on artificial intelligence and cloud computing and cloud service center
CN112163625A (en) * 2020-10-06 2021-01-01 翁海坤 Big data mining method based on artificial intelligence and cloud computing and cloud service center

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5794209A (en) * 1995-03-31 1998-08-11 International Business Machines Corporation System and method for quickly mining association rules in databases
US6324533B1 (en) * 1998-05-29 2001-11-27 International Business Machines Corporation Integrated database and data-mining system
CN105005570A (en) * 2014-04-23 2015-10-28 国家电网公司 Method and apparatus for mining massive intelligent power consumption data based on cloud computing
CN107870990A (en) * 2017-10-17 2018-04-03 北京德塔精要信息技术有限公司 A kind of automobile recommends method and device
CN108073701A (en) * 2017-12-13 2018-05-25 北京工业大学 A kind of method of the rare pattern of Mining Multidimensional time series data

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5794209A (en) * 1995-03-31 1998-08-11 International Business Machines Corporation System and method for quickly mining association rules in databases
US6324533B1 (en) * 1998-05-29 2001-11-27 International Business Machines Corporation Integrated database and data-mining system
CN105005570A (en) * 2014-04-23 2015-10-28 国家电网公司 Method and apparatus for mining massive intelligent power consumption data based on cloud computing
CN107870990A (en) * 2017-10-17 2018-04-03 北京德塔精要信息技术有限公司 A kind of automobile recommends method and device
CN108073701A (en) * 2017-12-13 2018-05-25 北京工业大学 A kind of method of the rare pattern of Mining Multidimensional time series data

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111159506A (en) * 2019-12-26 2020-05-15 广州信天翁信息科技有限公司 Data validity identification method, device and equipment and readable storage medium
CN111159506B (en) * 2019-12-26 2023-11-14 广州信天翁信息科技有限公司 Data validity identification method, device, equipment and readable storage medium
CN111258968A (en) * 2019-12-30 2020-06-09 广州博士信息技术研究院有限公司 Enterprise redundant data cleaning method and device and big data platform
CN112163156A (en) * 2020-10-06 2021-01-01 翁海坤 Big data processing method based on artificial intelligence and cloud computing and cloud service center
CN112163625A (en) * 2020-10-06 2021-01-01 翁海坤 Big data mining method based on artificial intelligence and cloud computing and cloud service center
CN113537271A (en) * 2020-10-06 2021-10-22 翁海坤 Big data mining method and system based on artificial intelligence and cloud service center
CN113536107A (en) * 2020-10-06 2021-10-22 翁海坤 Big data decision method and system based on block chain and cloud service center
CN113536107B (en) * 2020-10-06 2022-07-29 西安创业天下网络科技有限公司 Big data decision method and system based on block chain and cloud service center
CN113537271B (en) * 2020-10-06 2022-09-27 思玛特健康科技(苏州)有限公司 Big data mining method and system based on artificial intelligence and cloud service center

Also Published As

Publication number Publication date
CN110442762B (en) 2022-02-08

Similar Documents

Publication Publication Date Title
CN110442762A (en) Big data processing method based on cloud platform big data
US11880494B2 (en) Secure decentralized system utilizing smart contracts, a blockchain, and/or a distributed file system
EP3591510B1 (en) Method and device for writing service data in block chain system
US20200374133A1 (en) Blockchain generation method and system, and related device
US10885020B1 (en) Splitting incorrectly resolved entities using minimum cut
CN108733507A (en) The method and apparatus of file backup and recovery
CN113438219B (en) Playback transaction identification method and device based on blockchain all-in-one machine
CN111541783B (en) Transaction forwarding method and device based on block chain all-in-one machine
CN110442623A (en) Big data method for digging, device and data mining server
CN111539829B (en) To-be-filtered transaction identification method and device based on block chain all-in-one machine
CN111541784A (en) Transaction processing method and device based on block chain all-in-one machine
CN111541789A (en) Data synchronization method and device based on block chain all-in-one machine
WO2023071105A1 (en) Method and apparatus for analyzing feature variable, computer device, and storage medium
WO2024109454A1 (en) Label propagation method and apparatus for associated network, and computer readable storage medium
KR20230010695A (en) Differentiated private frequency deduplication
CN110674182A (en) Big data analysis method and data analysis server
CN109213801A (en) Data digging method and device based on incidence relation
CN111448551A (en) Method and system for tracking application activity data from a remote device and generating corrective action data structures for the remote device
US11652835B1 (en) Methods for security and privacy-enforced affinity scoring and devices thereof
WO2021098150A1 (en) Receipt data encryption method and apparatus, electronic device, and storage medium
US11550953B2 (en) Preserving cloud anonymity
US20240127332A1 (en) Secure Decentralized System and Method
US20220166778A1 (en) Application whitelisting based on file handling history
US11722518B2 (en) System for providing enhanced cryptography based response mechanism for malicious attacks
CN107707620A (en) Handle the method and device of I/O request

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant