CN110442762B - Big data processing method based on cloud platform big data - Google Patents

Big data processing method based on cloud platform big data Download PDF

Info

Publication number
CN110442762B
CN110442762B CN201910728420.4A CN201910728420A CN110442762B CN 110442762 B CN110442762 B CN 110442762B CN 201910728420 A CN201910728420 A CN 201910728420A CN 110442762 B CN110442762 B CN 110442762B
Authority
CN
China
Prior art keywords
mining
data
service
service candidate
mining service
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910728420.4A
Other languages
Chinese (zh)
Other versions
CN110442762A (en
Inventor
陈泉鑫
罗茂锐
陈少海
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen Jiu Ling Creative Technology Ltd
Original Assignee
Xiamen Jiu Ling Creative Technology Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen Jiu Ling Creative Technology Ltd filed Critical Xiamen Jiu Ling Creative Technology Ltd
Priority to CN201910728420.4A priority Critical patent/CN110442762B/en
Publication of CN110442762A publication Critical patent/CN110442762A/en
Application granted granted Critical
Publication of CN110442762B publication Critical patent/CN110442762B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the application provides a big data processing method and a big data processing server, different interestingness measurement dimensions are comprehensively considered, after all business big data of each dimension are clustered, the performance of the different interestingness measurement dimensions under different data mining project application-waiting scenes can be guaranteed to be more uniform, the big data mining capacity is improved, a plurality of data mining projects of services to be mined and the data dimension to be mined corresponding to each data mining project can be dynamically determined according to characteristic information of each cluster, follow-up data mining is carried out according to the data mining projects, and the problems that mining effect is poor or accuracy of data mining results is low due to the fact that fixed mining data dimensions are adopted in the prior art can be avoided.

Description

Big data processing method based on cloud platform big data
Technical Field
The application relates to the technical field of big data, in particular to a big data processing method based on cloud platform big data.
Background
At present, for big data mining schemes such as various online services (for example, order behavior services, browsing behavior services, and the like), most of the big data mining schemes only use one interestingness measurement dimension, although some of the big data mining schemes pay attention to the research on attributes and behaviors of different interestingness measurement dimension modes, for a certain service to be mined, different interestingness measurement dimensions show different performance performances in different data mining project application scenarios, and the use limitation of the big data mining schemes limits the capacity of big data mining. Moreover, in the whole data mining process, fixed mining data dimensions are mostly adopted for mining. However, when the mining data dimension is too large, the fixed mining data dimensions may not achieve a better mining effect, or when the fixed mining data dimension is less, the fixed mining data dimensions may be more than actually needed, which wastes computational resources on one hand, and on the other hand, increases the probability that the accuracy of the data mining result is not high.
Disclosure of Invention
In order to overcome at least the above disadvantages in the prior art, one of the objectives of the present application is to provide a big data processing method, which can ensure that the performance of different interestingness measurement dimensions is more uniform in different scenarios of data mining items to be applied after clustering all business big data of each dimension by comprehensively considering different interestingness measurement dimensions, improve the big data mining capability, dynamically determine a plurality of data mining items to be mined and data dimensions to be mined corresponding to each data mining item according to the characteristic information of each cluster, and perform subsequent data mining according to the determined data mining items, thereby avoiding the problems that the mining effect is not high or the accuracy of the data mining result is not high when fixed mining data dimensions are adopted in the prior art.
In a first aspect, the present application provides a big data processing method, which is applied to a big data processing server in communication connection with each service server corresponding to a service to be mined, where the method includes:
acquiring service big data of multiple dimensions from each service server, and clustering all the service big data of the dimensions according to each dimension to obtain a cluster of each dimension;
extracting feature information of the clustering cluster of each dimension, and determining a plurality of data mining items of the service to be mined and the data dimension to be mined corresponding to each data mining item according to the feature information of the clustering cluster of each dimension;
according to the plurality of data mining items of the service to be mined and the dimensionality of the data to be mined corresponding to each data mining item, respectively acquiring business process data corresponding to the dimensionality of the data to be mined under each data mining item;
acquiring business process data corresponding to the dimensionality of the data to be mined under each data mining item to obtain a big data mining result of the service to be mined;
and dividing the big data mining result of the service to be mined into a plurality of data segments, judging whether each data segment needs to be encrypted, encrypting the data segments needing to be encrypted to generate encrypted data of a random key, and finally sending all the data segments to each corresponding service server.
In a possible design of the first aspect, the step of obtaining a big data mining result of the service to be mined according to the business process data corresponding to the data dimension to be mined obtained under each data mining item includes:
obtaining a plurality of first mining service candidate item sets according to business process data corresponding to the data dimension to be mined under each data mining item, wherein each first mining service candidate item set comprises a plurality of mining service candidate items;
determining that mining service candidates identical to the preset mining service candidates in the preset mining service candidate retrieval table exist in the plurality of first mining service candidate sets according to a preset mining service candidate retrieval table, and using the mining service candidates as target mining service candidates of the plurality of first mining service candidate sets, wherein the preset mining service candidate retrieval table comprises the plurality of preset mining service candidates, a first association relation identifier for identifying every two preset mining service candidates with a first association relation and frequent item business levels of the preset mining service candidates with a second association relation, and the first association relation and the second association relation are respectively used for representing a strong association relation between frequent items and a weak association relation between frequent items;
according to the first incidence relation identification contained in the preset mining service candidate item retrieval table, determining a second mining service candidate item set of the preset mining service candidate items with a second incidence relation in each first mining service candidate item set with the preset mining service candidate items;
for each second mining service candidate item set, selecting one preset mining service candidate item as a parent mining service candidate item and other preset mining service candidate items as child mining service candidate items according to the corresponding frequent item service level of each preset mining service candidate item in the second mining service candidate item set in the preset mining service candidate item retrieval table;
and obtaining a big data mining result of the service to be mined according to the parent mining service candidate item and the child mining service candidate item.
In a possible design of the first aspect, the step of selecting one preset mining service candidate as a parent mining service candidate and the other preset mining service candidates as child mining service candidates according to a frequent item service level of each preset mining service candidate in the second mining service candidate set in the preset mining service candidate retrieval table includes:
and selecting the preset mining service candidate item with the frequent item service level greater than other preset mining service candidate items as a parent mining service candidate item and taking other preset mining service candidate items as child mining service candidate items according to the corresponding frequent item service level of each preset mining service candidate item in the second mining service candidate item set in the preset mining service candidate item retrieval table.
In a possible design of the first aspect, the step of obtaining a big data mining result of the service to be mined according to the parent mining service candidate and the child mining service candidate includes:
adding the parent mining service candidate into a specified mining item set, wherein the specified mining item set comprises mining strategies matched with the parent mining service candidate;
randomly generating a plurality of target sub mining service candidates from the plurality of sub mining service candidates according to the mining strategy matched with the parent mining service candidate and the proportion of the sub mining service candidates;
calculating the relevancy of each target sub mining service candidate item in the plurality of target sub mining service candidate items and the parent mining service candidate item;
according to the calculated degree of correlation between each target sub-mining service candidate item and the parent mining service candidate item, taking the target sub-mining service candidate item with the largest degree of correlation as a comparison sub-mining service candidate item, and selecting the remaining target sub-mining service candidate items in the plurality of target sub-mining service candidate items to obtain a selected target sub-mining service candidate item set;
carrying out crossover and mutation genetic operations on the selected target sub-mining service candidate item set to obtain a new target sub-mining service candidate item set;
calculating the relevance of each target sub mining service candidate item in the new target sub mining service candidate item set, judging whether the new target sub mining service candidate item set meets a preset condition according to the relevance of each target sub mining service candidate item in the new target sub mining service candidate item set and the relevance of the comparison sub mining service candidate items, and if so, outputting a big data mining result of the service to be mined, which corresponds to the new target sub mining service candidate item set and the parent mining service candidate item, according to the mining strategy.
In a possible design of the first aspect, the step of determining, according to feature information of a cluster of each dimension, a plurality of data mining items of the service to be mined and a dimension of the data to be mined corresponding to each data mining item includes:
analyzing the feature information of the clustering cluster of each dimension to obtain a high contribution value feature and a low contribution value feature;
calculating a first proportion of the high-contribution-value features in the feature information of the clustering cluster of each dimension and a second proportion of the low-contribution-value features in the feature information of the clustering cluster of each dimension;
determining a plurality of data mining items of the service to be mined according to the first proportion and the second proportion;
and determining the data dimension to be mined corresponding to each data mining project according to the plurality of data mining projects of the service to be mined and the contribution value of the service to be mined and a preset data dimension corresponding relation.
In one possible design of the first aspect, the determining the plurality of data mining items of the service to be mined according to the first and second ratios includes:
respectively determining a first mining coefficient of a high-contribution-value feature and a second mining coefficient of a low-contribution-value feature according to a first difference between the first proportion and a first set value and a second difference between the second proportion and a second set value;
determining a first proportion of data mining items corresponding to the high-contribution-value features and a second proportion of data mining items corresponding to the low-contribution-value features according to the first mining coefficient and the second mining coefficient;
and determining a plurality of data mining items of the service to be mined according to the first proportion and the second proportion.
In a possible design of the first aspect, the step of determining whether each data segment needs to be encrypted includes:
and judging whether the ratio of the high contribution value features in the feature information of the clustering cluster of each data segment exceeds a first threshold value, if so, judging that the data segment needs to be encrypted, and otherwise, not encrypting.
In a second aspect, an embodiment of the present application provides a big data processing server, including a processor, a memory, and a network interface. The memory and the network interface processor can be connected through a bus system. The network interface is configured to receive a message, the memory is configured to store a program, instructions or code, and the processor is configured to execute the program, instructions or code in the memory to perform the operations of the first aspect or any possible design of the first aspect.
In a third aspect, an embodiment of the present application provides a computer-readable storage medium, where instructions are stored, and when the instructions are executed on a computer, the computer is caused to execute the method in the first aspect or any possible design manner of the first aspect.
Based on any one of the above aspects, the method and the device have the advantages that different interestingness measurement dimensions are comprehensively considered, so that after all the business big data of each dimension are clustered, the performance of the different interestingness measurement dimensions under different data mining project to-be-applied scenes can be guaranteed to be more uniform, the big data mining capacity can be improved, a plurality of data mining projects of services to be mined and the data dimension to be mined corresponding to each data mining project can be dynamically determined according to the characteristic information of each clustering cluster, subsequent data mining can be performed according to the data mining performance, and the problems that the mining effect is poor or the accuracy of data mining results is low due to the fact that fixed mining data dimensions are adopted in the prior art can be solved. In addition, the big data mining result is divided into a plurality of data segments, so that huge workload can be avoided, and important data can be protected.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.
Fig. 1 is a schematic view of an application scenario of a big data processing method according to an embodiment of the present application;
FIG. 2 is a schematic flow chart illustrating a big data processing method according to an embodiment of the present disclosure;
FIG. 3 is a flow chart illustrating various sub-steps included in step S120 in one possible implementation shown in FIG. 2;
FIG. 4 is a flow chart illustrating various sub-steps included in step S140 in one possible implementation shown in FIG. 2;
fig. 5 is a schematic block diagram of a structure of a big data processing server for executing the above method according to an embodiment of the present application.
Detailed Description
The present application will now be described in detail with reference to the drawings, and the specific operations in the method embodiments may also be applied to the apparatus embodiments or the system embodiments. In the description of the present application, "at least one" includes one or more unless otherwise specified. "plurality" means two or more. For example, at least one of A, B and C, comprising: a alone, B alone, a and B in combination, a and C in combination, B and C in combination, and A, B and C in combination. In this application, "/" means "or, for example, A/B may mean A or B; "and/or" herein is merely an association describing an associated object, and means that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone.
Please refer to fig. 1, which is a schematic view of an application scenario of a big data processing method according to an embodiment of the present application. In this embodiment, the application scenario may include a big data processing server 100 and a plurality of service servers 200 communicatively connected to the big data processing server 100. Wherein, the big data processing server 100 can provide data mining service for a plurality of business servers 200. Each service server 200 may be a server that individually performs various online services, such as order services, transaction services, and the like.
Fig. 2 is a schematic flow chart of a big data processing method according to an embodiment of the present application. In this embodiment, the big data processing method may be executed by the big data processing server 100 shown in fig. 1, and the details of the big data processing method will be described below.
Step S110, obtaining the service big data of multiple dimensions from each service server 200, and clustering all the service big data of the dimension to obtain a cluster of each dimension.
In this embodiment, the service to be mined may be a mining service actually determined according to a user requirement, specifically, the service server 200 associated with the service to be mined may be selected according to a setting of a user, then, the service big data of a plurality of dimensions are obtained from the service servers 200, and for each dimension, all the service big data of the dimension are clustered, so that a cluster of each dimension is obtained.
And step S120, extracting the characteristic information of the clustering cluster of each dimension, and determining a plurality of data mining items of the service to be mined and the dimension of the data to be mined corresponding to each data mining item according to the characteristic information of the clustering cluster of each dimension.
In this embodiment, for example, windowing may be performed on the cluster of each dimension; inputting the cluster of each dimension in the window into a CCIPCA algorithm to calculate the characteristic information of the cluster of each dimension.
Step S130, according to the plurality of data mining items of the service to be mined and the dimensionality of the data to be mined corresponding to each data mining item, business process data corresponding to the dimensionality of the data to be mined is acquired under each data mining item.
In this embodiment, the business process data may include, but is not limited to, business history data in the dimension of the data to be mined and current real-time generated history data.
Step S140, obtaining the business process data corresponding to the data dimension to be mined under each data mining item, and obtaining the big data mining result of the service to be mined.
Step S150, dividing the big data mining result of the service to be mined into a plurality of data segments, judging whether each data segment needs to be encrypted, encrypting the data segments needing to be encrypted to generate encrypted data of a random key, and finally sending all the data segments to each corresponding service server.
The data segments may be stored in files, i.e., each file is treated as a data segment. In other embodiments, multiple data segments may be stored in the same file. The encryption method is not limited, and an existing encryption method may be used.
Based on the above steps, in this embodiment, different interestingness measurement dimensions are comprehensively considered, so that after all the business big data of each dimension are clustered, it can be ensured that the performance of the different interestingness measurement dimensions is more uniform in different data mining project application scenarios, the big data mining capability is improved, a plurality of data mining projects of the service to be mined and the data dimension to be mined corresponding to each data mining project can be dynamically determined according to the feature information of each cluster, and subsequent data mining is performed according to the data mining performance, thereby avoiding the problem that mining effect is poor or accuracy of the data mining result is low when fixed mining data dimensions are adopted in the prior art. In addition, by dividing the big data mining result into a plurality of data segments, huge workload caused by encryption of all data can be avoided, and important data can be protected. In one possible design, please refer to fig. 3, and the step S120 may specifically include the following sub-steps:
and a substep S121, analyzing and obtaining a high contribution value feature and a low contribution value feature from the feature information of the cluster of each dimension.
And a substep S122, calculating a first ratio of the high-contribution-value features in the feature information of the cluster of each dimension and a second ratio of the low-contribution-value features in the feature information of the cluster of each dimension.
And a substep S123, determining a plurality of data mining items of the service to be mined according to the first proportion and the second proportion.
And a substep S124, determining the data dimension to be mined corresponding to each data mining item according to the plurality of data mining items of the service to be mined and the contribution value of the service to be mined and a preset data dimension corresponding relation.
In a possible implementation manner, for sub-step S123, first, a first mining coefficient of the feature with a high contribution value and a second mining coefficient of the feature with a low contribution value are determined according to a first difference between the first percentage and a first set value and a second difference between the second percentage and a second set value. And then, determining a first proportion of data mining items corresponding to the high-contribution-value features and a second proportion of data mining items corresponding to the low-contribution-value features according to the first mining coefficient and the second mining coefficient, and finally determining a plurality of data mining items of the service to be mined according to the first proportion and the second proportion.
Based on the above steps, the present embodiment further considers the ratio of the high contribution value feature and the low contribution value feature in the feature information of the cluster of each dimension, so as to determine a plurality of data mining items of the service to be mined. Moreover, the subjective influence of fixed mining data dimension selection can be effectively reduced, and the mining error rate is reduced.
In one possible design, please refer to fig. 4, and the step S140 may specifically include the following sub-steps:
and a substep S141, obtaining a plurality of first mining service candidate item sets according to the business process data corresponding to the data dimension to be mined under each data mining item.
In this embodiment, each first mining service candidate set may include a plurality of mining service candidates. In this sub-step, a plurality of first mining service candidate sets may be obtained by matching the business process data with the reference process data of each mining service candidate.
And a substep S142, determining, according to a preset mining service candidate item search table, that there is a mining service candidate item identical to a preset mining service candidate item included in the preset mining service candidate item search table in the plurality of first mining service candidate items, and using the mining service candidate item as a target mining service candidate item of the plurality of first mining service candidate items.
In this embodiment, the preset mining service candidate retrieval table may include a plurality of preset mining service candidates, a first association relationship identifier for identifying every two preset mining service candidates having a first association relationship, and a frequent item service level of each preset mining service candidate having a second association relationship, where the first association relationship and the second association relationship are respectively used to represent a strong association relationship between frequent items and a weak association relationship between frequent items. Optionally, the strong association relationship may indicate that the two preset mining service candidates have association in a business context order, and the weak association relationship may indicate that the two preset mining service candidates do not have association in a business context order.
And a substep S143, determining a second mining service candidate item set of the preset mining service candidate items with a second association relation in each first mining service candidate item set with the preset mining service candidate items according to the first association relation identifier included in the preset mining service candidate item retrieval table.
And a substep S144, for each second mining service candidate item set, selecting one preset mining service candidate item as a parent mining service candidate item and the other preset mining service candidate items as child mining service candidate items according to the frequent item service level of each preset mining service candidate item in the second mining service candidate item set in the preset mining service candidate item retrieval table.
And a substep S145, obtaining a big data mining result of the service to be mined according to the parent mining service candidate and the child mining service candidate.
Based on the above steps, the embodiment further considers the strong association relationship between the frequent items and the weak association relationship between the frequent items, and performs retrieval and mining based on the strong association relationship and the weak association relationship, so that the situation that the mining result deviates from the dimension of the data to be mined due to the misassociation mining of the data in the process of mining the frequent items can be avoided, and the mining accuracy is further improved.
As an optional implementation manner, for the sub-step S144, according to the frequent item service level of each preset mining service candidate in the second mining service candidate set corresponding to the preset mining service candidate in the preset mining service candidate retrieval table, the preset mining service candidate whose frequent item service level is greater than that of other preset mining service candidates is selected as a parent mining service candidate, and the other preset mining service candidates are used as child mining service candidates.
As an optional implementation manner, for the substep S145, in order to adaptively adjust mining for different mining service candidates, and facilitate enhanced mining for large data with a small data size, and improve mining efficiency, this embodiment may add the parent mining service candidate to a specified mining item set, where the specified mining item set includes a mining policy matched with the parent mining service candidate. And then, according to the mining strategy matched with the parent mining service candidate and the proportion of the child mining service candidates, randomly generating a plurality of target child mining service candidates from the child mining service candidates, and calculating the correlation degree of each target child mining service candidate in the plurality of target child mining service candidates and the parent mining service candidate.
On this basis, the target sub-mining service candidate with the largest correlation degree may be further used as a comparison sub-mining service candidate according to the calculated correlation degree between each target sub-mining service candidate and the parent mining service candidate, and remaining target sub-mining service candidates in the plurality of target sub-mining service candidates may be selected to obtain a selected target sub-mining service candidate set. And then, carrying out crossover and mutation genetic operations on the selected target sub-mining service candidate item set to obtain a new target sub-mining service candidate item set. And then, calculating the relevance of each target sub mining service candidate item in the new target sub mining service candidate item set, judging whether the new target sub mining service candidate item set meets a preset condition according to the relevance of each target sub mining service candidate item in the new target sub mining service candidate item set and the relevance of the comparison sub mining service candidate item, and if so, outputting a big data mining result of the service to be mined, which corresponds to the new target sub mining service candidate item set and the parent mining service candidate item, according to the mining strategy.
As an optional implementation manner, in step S150, the step of determining whether each data segment needs to be encrypted includes:
and S151, judging whether the ratio of the high contribution value features in the feature information of the cluster of each data segment exceeds a first threshold value, if so, judging that the data segment needs to be encrypted, otherwise, not encrypting.
In step S151, whether the ratio of the high contribution value feature in the feature information of the cluster of each data segment exceeds the first threshold is used as a basis for determining whether encryption is required, so that the calculation amount can be reduced, and the efficiency can be improved. Of course, other determination methods may be used, for example, setting an additional keyword rule, and calculating whether the required security of each file exceeds a set threshold value through the keyword rule, so as to determine whether encryption is required.
Based on the foregoing description, the big data processing server 100 may send the big data mining result of the service to be mined to each corresponding service server 200.
Fig. 5 is a schematic structural diagram of a big data processing server 100 for executing the big data processing method according to an embodiment of the present application, and as shown in fig. 5, the big data processing server 100 may include a network interface 110, a machine-readable storage medium 120, a processor 130, and a bus 140. The number of the processors 130 may be one or more, and one processor 130 is taken as an example in fig. 5; the network interface 110, the machine-readable storage medium 120, and the processor 130 may be connected by a bus 140 or otherwise, as exemplified by the connection by the bus 140 in fig. 5.
The machine-readable storage medium 120 is a computer-readable storage medium, and can be used to store software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to the method for establishing a knowledge base for robot automatic question answering in the embodiments of the present application. The processor 130 executes various functional applications and data processing of the terminal device by running the software programs, instructions and modules stored in the machine-readable storage medium 120, that is, implements the above-mentioned big data processing method, which is not described herein again.
The machine-readable storage medium 120 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the terminal, and the like. Further, the machine-readable storage medium 120 may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The non-volatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable PROM (EEPROM), or a flash Memory. Volatile Memory can be Random Access Memory (RAM), which acts as external cache Memory. By way of example, but not limitation, many forms of RAM are available, such as Static random access memory (Static RAM, SRAM), Dynamic Random Access Memory (DRAM), Synchronous Dynamic random access memory (Synchronous DRAM, SDRAM), Double Data rate Synchronous Dynamic random access memory (DDR SDRAM), Enhanced Synchronous SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), and direct memory bus RAM (DR RAM). It should be noted that the memory of the systems and methods described herein is intended to comprise, without being limited to, these and any other suitable types of memory. In some examples, the machine-readable storage medium 120 may further include memory located remotely from the processor 130, which may be connected to the terminal device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The processor 130 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method embodiments may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 130. The processor 130 may be a general-purpose processor, a Digital signal processor (Digital signal processor dsp), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, or discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor.
The big data processing server 100 can perform information interaction with other devices (such as the service server 200) through the communication interface 110. Communication interface 110 may be a circuit, bus, transceiver, or any other device that may be used to exchange information. Processor 130 may send and receive information using communication interface 110.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
Embodiments of the present application are described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be apparent to those skilled in the art that various changes and modifications may be made in the embodiments of the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the embodiments of the present application fall within the scope of the claims of the present application and their equivalents, the present application is also intended to encompass such modifications and variations.

Claims (6)

1. A big data processing method based on cloud platform big data is characterized in that the method is applied to big data processing servers which are in communication connection with all business servers corresponding to services to be mined, and the method comprises the following steps:
acquiring service big data of multiple dimensions from each service server, and clustering all the service big data of the dimensions according to each dimension to obtain a cluster of each dimension;
extracting feature information of the clustering cluster of each dimension, and determining a plurality of data mining items of the service to be mined and the data dimension to be mined corresponding to each data mining item according to the feature information of the clustering cluster of each dimension;
according to the plurality of data mining items of the service to be mined and the dimensionality of the data to be mined corresponding to each data mining item, respectively acquiring business process data corresponding to the dimensionality of the data to be mined under each data mining item;
acquiring business process data corresponding to the dimensionality of the data to be mined under each data mining item to obtain a big data mining result of the service to be mined;
dividing the big data mining result of the service to be mined into a plurality of data segments, judging whether each data segment needs to be encrypted, encrypting the data segments needing to be encrypted to generate encrypted data of a random key, and finally sending all the data segments to each corresponding service server;
the step of obtaining the big data mining result of the service to be mined according to the business process data corresponding to the data dimension to be mined under each data mining item comprises the following steps:
obtaining a plurality of first mining service candidate item sets according to business process data corresponding to the data dimension to be mined under each data mining item, wherein each first mining service candidate item set comprises a plurality of mining service candidate items;
determining that mining service candidates identical to the preset mining service candidates in the preset mining service candidate retrieval table exist in the plurality of first mining service candidate sets according to a preset mining service candidate retrieval table, and using the mining service candidates as target mining service candidates of the plurality of first mining service candidate sets, wherein the preset mining service candidate retrieval table comprises the plurality of preset mining service candidates, a first association relation identifier for identifying every two preset mining service candidates with a first association relation and frequent item business levels of the preset mining service candidates with a second association relation, and the first association relation and the second association relation are respectively used for representing a strong association relation between frequent items and a weak association relation between frequent items;
according to the first incidence relation identification contained in the preset mining service candidate item retrieval table, determining a second mining service candidate item set of the preset mining service candidate items with a second incidence relation in each first mining service candidate item set with the preset mining service candidate items;
for each second mining service candidate item set, selecting one preset mining service candidate item as a parent mining service candidate item and other preset mining service candidate items as child mining service candidate items according to the corresponding frequent item service level of each preset mining service candidate item in the second mining service candidate item set in the preset mining service candidate item retrieval table;
and obtaining a big data mining result of the service to be mined according to the parent mining service candidate item and the child mining service candidate item.
2. The big data processing method according to claim 1, wherein the step of selecting one of the preset mining service candidates as a parent mining service candidate and the other preset mining service candidates as child mining service candidates according to the frequent business level of each of the preset mining service candidates in the second mining service candidate set in the preset mining service candidate retrieval table comprises:
and selecting the preset mining service candidate item with the frequent item service level greater than other preset mining service candidate items as a parent mining service candidate item and taking other preset mining service candidate items as child mining service candidate items according to the corresponding frequent item service level of each preset mining service candidate item in the second mining service candidate item set in the preset mining service candidate item retrieval table.
3. The big data processing method according to claim 1, wherein the step of obtaining the big data mining result of the service to be mined according to the parent mining service candidate and the child mining service candidate comprises:
adding the parent mining service candidate into a specified mining item set, wherein the specified mining item set comprises mining strategies matched with the parent mining service candidate;
randomly generating a plurality of target sub mining service candidates from the plurality of sub mining service candidates according to the mining strategy matched with the parent mining service candidate and the proportion of the sub mining service candidates;
calculating the relevancy of each target sub mining service candidate item in the plurality of target sub mining service candidate items and the parent mining service candidate item;
according to the calculated degree of correlation between each target sub-mining service candidate item and the parent mining service candidate item, taking the target sub-mining service candidate item with the largest degree of correlation as a comparison sub-mining service candidate item, and selecting the remaining target sub-mining service candidate items in the plurality of target sub-mining service candidate items to obtain a selected target sub-mining service candidate item set;
carrying out crossover and mutation genetic operations on the selected target sub-mining service candidate item set to obtain a new target sub-mining service candidate item set;
calculating the relevance of each target sub mining service candidate item in the new target sub mining service candidate item set, judging whether the new target sub mining service candidate item set meets a preset condition according to the relevance of each target sub mining service candidate item in the new target sub mining service candidate item set and the relevance of the comparison sub mining service candidate items, and if so, outputting a big data mining result of the service to be mined, which corresponds to the new target sub mining service candidate item set and the parent mining service candidate item, according to the mining strategy.
4. The big data processing method according to claim 1, wherein the step of determining the plurality of data mining items of the service to be mined and the dimension of the data to be mined corresponding to each data mining item according to the feature information of the cluster of each dimension comprises:
analyzing the feature information of the clustering cluster of each dimension to obtain a high contribution value feature and a low contribution value feature;
calculating a first proportion of the high-contribution-value features in the feature information of the clustering cluster of each dimension and a second proportion of the low-contribution-value features in the feature information of the clustering cluster of each dimension;
determining a plurality of data mining items of the service to be mined according to the first proportion and the second proportion;
and determining the data dimension to be mined corresponding to each data mining project according to the plurality of data mining projects of the service to be mined and the contribution value of the service to be mined and a preset data dimension corresponding relation.
5. The big data processing method according to claim 4, wherein the step of determining the plurality of data mining items of the service to be mined according to the first and second ratios comprises:
respectively determining a first mining coefficient of a high-contribution-value feature and a second mining coefficient of a low-contribution-value feature according to a first difference between the first proportion and a first set value and a second difference between the second proportion and a second set value;
determining a first proportion of data mining items corresponding to the high-contribution-value features and a second proportion of data mining items corresponding to the low-contribution-value features according to the first mining coefficient and the second mining coefficient;
and determining a plurality of data mining items of the service to be mined according to the first proportion and the second proportion.
6. The big data processing method according to claim 4, wherein the step of determining whether each data segment needs to be encrypted comprises:
and judging whether the ratio of the high contribution value features in the feature information of the clustering cluster of each data segment exceeds a first threshold value, if so, judging that the data segment needs to be encrypted, and otherwise, not encrypting.
CN201910728420.4A 2019-08-08 2019-08-08 Big data processing method based on cloud platform big data Active CN110442762B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910728420.4A CN110442762B (en) 2019-08-08 2019-08-08 Big data processing method based on cloud platform big data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910728420.4A CN110442762B (en) 2019-08-08 2019-08-08 Big data processing method based on cloud platform big data

Publications (2)

Publication Number Publication Date
CN110442762A CN110442762A (en) 2019-11-12
CN110442762B true CN110442762B (en) 2022-02-08

Family

ID=68433720

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910728420.4A Active CN110442762B (en) 2019-08-08 2019-08-08 Big data processing method based on cloud platform big data

Country Status (1)

Country Link
CN (1) CN110442762B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111159506B (en) * 2019-12-26 2023-11-14 广州信天翁信息科技有限公司 Data validity identification method, device, equipment and readable storage medium
CN111258968B (en) * 2019-12-30 2020-09-11 广州博士信息技术研究院有限公司 Enterprise redundant data cleaning method and device and big data platform
CN113536107B (en) * 2020-10-06 2022-07-29 西安创业天下网络科技有限公司 Big data decision method and system based on block chain and cloud service center
CN112163625B (en) * 2020-10-06 2021-06-25 西安石油大学 Big data mining method based on artificial intelligence and cloud computing and cloud service center

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5794209A (en) * 1995-03-31 1998-08-11 International Business Machines Corporation System and method for quickly mining association rules in databases
US6324533B1 (en) * 1998-05-29 2001-11-27 International Business Machines Corporation Integrated database and data-mining system
CN105005570A (en) * 2014-04-23 2015-10-28 国家电网公司 Method and apparatus for mining massive intelligent power consumption data based on cloud computing
CN107870990A (en) * 2017-10-17 2018-04-03 北京德塔精要信息技术有限公司 A kind of automobile recommends method and device
CN108073701A (en) * 2017-12-13 2018-05-25 北京工业大学 A kind of method of the rare pattern of Mining Multidimensional time series data

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5794209A (en) * 1995-03-31 1998-08-11 International Business Machines Corporation System and method for quickly mining association rules in databases
US6324533B1 (en) * 1998-05-29 2001-11-27 International Business Machines Corporation Integrated database and data-mining system
CN105005570A (en) * 2014-04-23 2015-10-28 国家电网公司 Method and apparatus for mining massive intelligent power consumption data based on cloud computing
CN107870990A (en) * 2017-10-17 2018-04-03 北京德塔精要信息技术有限公司 A kind of automobile recommends method and device
CN108073701A (en) * 2017-12-13 2018-05-25 北京工业大学 A kind of method of the rare pattern of Mining Multidimensional time series data

Also Published As

Publication number Publication date
CN110442762A (en) 2019-11-12

Similar Documents

Publication Publication Date Title
CN110442762B (en) Big data processing method based on cloud platform big data
JP7441582B2 (en) Methods, devices, computer-readable storage media and programs for detecting data breaches
US20230350774A1 (en) Methods and systems for determining system capacity
CN108920947B (en) Abnormity detection method and device based on log graph modeling
CN110943961B (en) Data processing method, device and storage medium
CN110442623B (en) Big data mining method and device and data mining server
US20120303624A1 (en) Dynamic rule reordering for message classification
CN110855648B (en) Early warning control method and device for network attack
CN112351088A (en) CDN cache method, device, computer equipment and storage medium
JP2015526800A (en) Push business objects
US11366821B2 (en) Epsilon-closure for frequent pattern analysis
CN110674182A (en) Big data analysis method and data analysis server
CN111245897B (en) Data processing method, device, system, storage medium and processor
US9600251B1 (en) Enhancing API service schemes
CN111814052A (en) Mobile internet user management method, device, server and readable storage medium
WO2023109627A1 (en) Distributed system sharding method and apparatus, electronic device, and storage medium
CN116938776A (en) Method, device, electronic equipment and medium for network asset mapping
KR20200132521A (en) Apparatus for guaranteeing integrity of state database in blockchain-based environment and method thereof
US20190057151A1 (en) Predictive modeling in event processing systems for big data processing in cloud
CN113411364B (en) Resource acquisition method and device and server
CN110990852B (en) Big data security protection method and device, server and readable storage medium
CN113641909A (en) Information pushing method and device, electronic equipment and computer readable medium
CN112055076A (en) Multifunctional intelligent monitoring method and device based on Internet and server
CN111667190A (en) Electric power construction grounding monitoring method and device and server
CN112055075B (en) Internet product monitoring method, device, server and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant