CN110674373A - Big data processing method, device, equipment and storage medium based on sensitive data - Google Patents

Big data processing method, device, equipment and storage medium based on sensitive data Download PDF

Info

Publication number
CN110674373A
CN110674373A CN201910876650.5A CN201910876650A CN110674373A CN 110674373 A CN110674373 A CN 110674373A CN 201910876650 A CN201910876650 A CN 201910876650A CN 110674373 A CN110674373 A CN 110674373A
Authority
CN
China
Prior art keywords
state function
screening
samples
parameter set
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910876650.5A
Other languages
Chinese (zh)
Other versions
CN110674373B (en
Inventor
张少典
马汉东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Sen Sen Medical Technology Co Ltd
Original Assignee
Shanghai Sen Sen Medical Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Sen Sen Medical Technology Co Ltd filed Critical Shanghai Sen Sen Medical Technology Co Ltd
Priority to CN201910876650.5A priority Critical patent/CN110674373B/en
Publication of CN110674373A publication Critical patent/CN110674373A/en
Application granted granted Critical
Publication of CN110674373B publication Critical patent/CN110674373B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/9035Filtering based on additional data, e.g. user or group profiles

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application provides a big data processing method, a big data processing device, big data processing equipment and a storage medium based on sensitive data, wherein the number of samples is determined according to a preset condition, and a state function is determined according to the number of the samples; screening the number of seeds according to the state function, and adding the number of seeds meeting the screening condition into a parameter set; and judging whether the parameter set meets the sample number, if so, outputting the parameter set, and otherwise, jumping to the previous step. According to the method and the device, the required sample data set can be quickly screened out from the original data set without sensitive data by establishing the number of the samples and the state function, and the state function can be optimized through unsatisfied samples, so that the data characteristics represented by the sample data set are highly consistent with the authenticity data characteristics of the original data set, and the method and the device have the advantages of being efficient in screening and capable of keeping the authenticity of the original data set.

Description

Big data processing method, device, equipment and storage medium based on sensitive data
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a big data processing method, apparatus, device, and storage medium based on sensitive data.
Background
Currently, the big data field generally involves data holders, data providers, and data consumers. The data holder has the use right and ownership of the data, but does not know how to utilize the data number to generate value; the data provider has the capability of data analysis, and can analyze the original data to obtain some conclusions; the data user has no data ownership and no data analysis capability, but needs to perform practical application according to the analysis result of the original data.
The data holder can find the cooperation of the data provider, and the data user can purchase the data, and in the field of sensitive data such as medical data or government identity data, the data contain sensitive information and cannot be directly leaked to the data user, so the data user needs to purchase a data analysis conclusion obtained by the analysis of the data provider.
Data providers at present generally adopt a random sampling mode to improve the value density of big data, analysis results obtained through the mode often have certain error with authenticity features expressed by a big data total set, the error is reduced by enlarging the number of sampling samples, however, computational analysis cost is sacrificed, a data user cannot effectively know comprehensive information of the big data, the data user cannot be applied in a targeted manner, the data cannot give out the maximum utilization value, the data user cannot know effective analysis data, and asymmetry in information circulation is caused. The information asymmetry causes unsmooth information exchange, so that the analysis process of a data provider is very long and difficult, the requirements of a data user cannot be met, and the expected effect cannot be achieved.
Therefore, how to keep the authenticity characteristics of the sample data set consistent with those of the original data set under the condition of accelerated screening is a technical problem to be solved by those skilled in the art.
Disclosure of Invention
In view of the above-mentioned shortcomings in the prior art, it is an object of the present application to provide a method, an apparatus, a device and a storage medium for big data processing based on sensitive data, so as to solve at least one problem existing in the prior art.
To achieve the above and other related objects, the present application provides a big data processing method based on sensitive data, the method comprising: establishing the number of samples according to a preset condition, and establishing a state function according to the number of the samples; screening the number of seeds according to the state function, and adding the number of seeds meeting the screening condition into a parameter set; and judging whether the parameter set meets the sample number, if so, outputting the parameter set, and otherwise, jumping to the previous step.
In an embodiment of the present application, the screening the seed number according to the state function, and adding the seed number meeting the screening condition to a parameter set includes: calling an original data set; randomly extracting a sample as the seed number, and substituting the seed number into the state function for calculation; judging whether the evaluation indexes corresponding to various parameter requirements in the screening conditions are met or not; if yes, carrying out the next step, otherwise, skipping to the previous step; calculating whether the state function meets the requirement, if so, carrying out the next step, otherwise, skipping to the last step; adding the seed number meeting the requirement into the parameter set; and disassembling the state function to analyze that the condition is not met, and adding the optimal sample into the parameter set.
In an embodiment of the present application, the raw data set is a big data set without sensitive data; the parameter set is a sample data set.
In an embodiment of the present application, the screening condition is established according to a specific parameter type in the original data set.
In an embodiment of the present application, the state function is disassembled through a dynamic specification algorithm.
In an embodiment of the present application, the disassembling the state function to analyze that the requirement is not satisfied, and adding the optimal sample to the parameter set includes: randomly calling a sample which does not meet the screening condition; splitting a big problem which does not meet the screening condition into a plurality of small problems; backward pushing from the last step of the minor problems according to the steps, finding out reasons which do not meet the conditions according to the state function, analyzing imperfect conditions in the screening conditions corresponding to the state function according to the reasons, and repeating the steps to obtain a plurality of unsatisfied samples; selecting the optimal solution which can optimize the state function in the unsatisfied samples as the optimal sample according to the screening conditions; outputting the optimal sample to add to the set of parameters.
In an embodiment of the present application, the state function is a screening process established according to the number of the samples, and can be adjusted in real time according to the unsatisfied samples.
To achieve the above and other related objects, the present application provides a big data processing apparatus, comprising: the establishing module is used for establishing the number of samples according to a preset condition and establishing a state function according to the number of the samples; the processing module is used for screening the seed number according to the state function and adding the seed number meeting the screening condition into a parameter set; and judging whether the parameter set meets the sample number, if so, outputting the parameter set, and otherwise, jumping to the previous step.
To achieve the above and other related objects, the present application provides a computer apparatus, comprising: a memory, and a processor; the memory is to store computer instructions; the processor executes computer instructions to implement the method as described above.
To achieve the above and other related objects, the present application provides a computer readable storage medium storing computer instructions which, when executed, perform the method as described above.
In summary, according to the big data processing method, the big data processing device, the big data processing equipment and the storage medium based on the sensitive data, the number of samples is determined according to the preset conditions, and the state function is determined according to the number of the samples; screening the number of seeds according to the state function, and adding the number of seeds meeting the screening condition into a parameter set; and judging whether the parameter set meets the sample number, if so, outputting the parameter set, and otherwise, jumping to the previous step.
Has the following beneficial effects:
1. according to the big data processing method based on the sensitive data, the required sample data set can be quickly screened out from the original data set without the sensitive data by establishing the number of the samples and the state function, and the state function can be optimized by not meeting the samples, so that the data characteristics represented by the sample data set are highly consistent with the authenticity data characteristics of the original data set, the big data user can more comprehensively know the big data information, the asymmetry of information circulation in statistics is avoided, and the big data processing method based on the sensitive data has the advantages of being high in screening efficiency and keeping the authenticity of the original data set.
2. According to the big data processing method based on the sensitive data, the unsatisfied sample in the data can be analyzed for unsatisfied reasons in a dynamic specification algorithm mode, the sample data which does not directly satisfy the conditions but has reference value in the data is added into the parameter set, and the method has the advantage of further improving the value of the processed parameter set.
3. According to the big data processing method based on the sensitive data, the big data with the sensitive data removed is used as the original data set to be processed, the low-value density attribute in the big data can be efficiently removed, the sample data set with high-value referential property is left, and the method has the advantage of further improving the authenticity of the sample data set.
Drawings
Fig. 1 is a flowchart illustrating a big data processing method based on sensitive data according to an embodiment of the present application.
Fig. 2 is a flowchart illustrating step S2 of the sensitive data-based big data processing method according to an embodiment of the present application.
Fig. 3 is a flowchart illustrating step S26 of the sensitive data-based big data processing method according to an embodiment of the present application.
FIG. 4 is a block diagram of a big data processing apparatus according to an embodiment of the present application.
Fig. 5 is a schematic structural diagram of a computer device according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application is provided by way of specific examples, and other advantages and effects of the present application will be readily apparent to those skilled in the art from the disclosure herein. The present application is capable of other and different embodiments and its several details are capable of modifications and/or changes in various respects, all without departing from the spirit of the present application. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.
Embodiments of the present application will be described in detail below with reference to the accompanying drawings so that those skilled in the art to which the present application pertains can easily carry out the present application. The present application may be embodied in many different forms and is not limited to the embodiments described herein.
In order to clearly explain the present application, components that are not related to the description are omitted, and the same reference numerals are given to the same or similar components throughout the specification.
Throughout the specification, when a component is referred to as being "connected" to another component, this includes not only the case of being "directly connected" but also the case of being "indirectly connected" with another element interposed therebetween. In addition, when a component is referred to as "including" a certain constituent element, unless otherwise stated, it means that the component may include other constituent elements, without excluding other constituent elements.
When an element is referred to as being "on" another element, it can be directly on the other element, or intervening elements may also be present. When a component is referred to as being "directly on" another component, there are no intervening components present.
Although the terms first, second, etc. may be used herein to describe various elements in some instances, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. For example, the first interface and the second interface, etc. are described. Also, as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context indicates otherwise. It will be further understood that the terms "comprises," "comprising," "includes" and/or "including," when used in this specification, specify the presence of stated features, steps, operations, elements, components, items, species, and/or groups, but do not preclude the presence, or addition of one or more other features, steps, operations, elements, components, species, and/or groups thereof. The terms "or" and/or "as used herein are to be construed as inclusive or meaning any one or any combination. Thus, "A, B or C" or "A, B and/or C" means "any of the following: a; b; c; a and B; a and C; b and C; A. b and C ". An exception to this definition will occur only when a combination of elements, functions, steps or operations are inherently mutually exclusive in some way.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used herein, the singular forms "a", "an" and "the" include plural forms as long as the words do not expressly indicate a contrary meaning. The term "comprises/comprising" when used in this specification is taken to specify the presence of stated features, regions, integers, steps, operations, elements, and/or components, but does not exclude the presence or addition of other features, regions, integers, steps, operations, elements, and/or components.
Terms indicating "lower", "upper", and the like relative to space may be used to more easily describe a relationship of one component with respect to another component illustrated in the drawings. Such terms are intended to include not only the meanings indicated in the drawings, but also other meanings or operations of the device in use. For example, if the device in the figures is turned over, elements described as "below" other elements would then be oriented "above" the other elements. Thus, the exemplary terms "under" and "beneath" all include above and below. The device may be rotated 90 or other angles and the terminology representing relative space is also to be interpreted accordingly.
Fig. 1 is a schematic flow chart of a big data processing method based on sensitive data according to an embodiment of the present application. As shown in the figure, the method includes steps S1 to S3, which are specifically as follows:
step S1: establishing the number of samples according to a preset condition, and establishing a state function according to the number of the samples;
in this embodiment, the predetermined condition is a condition for screening target data to determine a required sample. Such as data type, attributes, categories, etc.
The state function is a function which is constructed based on a plurality of state attributes and is used for characterizing the change of the data system, when the state of the system is changed, a series of properties of the system are changed, and the change amount is only dependent on the initial state and the final state and is not related to the path which is undergone during the change.
In this embodiment, by establishing the number of samples and the state function, a required sample data set can be quickly screened from an original data set from which sensitive data is removed, and the state function can be optimized by unsatisfied samples, so that data characteristics represented by the sample data set are highly consistent with authenticity data characteristics of the original data set, thereby facilitating a big data user to more comprehensively know big data information, avoiding asymmetry of information circulation in statistics, and having the advantages of high screening efficiency and keeping authenticity of the original data set.
Step S2: screening the number of seeds according to the state function, and adding the number of seeds meeting the screening condition into a parameter set;
in an embodiment of the present application, the state function is a screening process established according to the number of the samples, and can be adjusted in real time according to the unsatisfied samples.
As shown in fig. 2, in the present embodiment, the step S2 includes steps S21 to S26, which are as follows:
step S21: calling an original data set;
in an embodiment of the present application, the raw data set is a big data set without sensitive data. Generally, in the field of sensitive data such as medical data or government identity data, the data cannot be directly revealed to a data user because the data contains sensitive information.
In this embodiment, the big data from which the sensitive data is removed is processed as the original data set, so that the low-value density attribute in the big data can be efficiently removed, and a sample data set with high-value referential property is left, which has the advantage of further improving the authenticity of the sample data set.
Step S22: and randomly drawing a sample as the seed number, and substituting the seed number into the state function for calculation.
In this embodiment, the random numbers generated by the computer are simulated by a long string of serial numbers, so called pseudo-random numbers, and when the random numbers are practically applied, the random numbers generally have all the probabilistic properties and statistical properties of the real random numbers, so that a great number of serial pseudo-random numbers can be generated, wherein the first random number of a sequence corresponds to a number, and the number is called a seed number.
Step S23: judging whether the evaluation indexes corresponding to various parameter requirements in the screening conditions are met or not; if so, go to the next step 24, otherwise, go to the previous step 22.
In an embodiment of the present application, the screening condition is established according to a specific parameter type in the original data set.
Step S24: calculating whether the state function meets the requirement, if so, performing the next step 25, otherwise, skipping to the last step 26;
step S25: and adding the seed number meeting the requirement into the parameter set.
In an embodiment of the present application, the parameter set is a sample data set.
Step S26: and disassembling the state function to analyze that the condition is not met, and adding the optimal sample into the parameter set.
In an embodiment of the present application, the state function is disassembled through a dynamic specification algorithm.
Dynamic programming is a method used in mathematics, computer science and economics to solve complex problems by decomposing the original problem into relatively simple sub-problems. The dynamic programming algorithm is to divide the problem, define the problem state and the relation between the states, and make the problem solve in a recursion (or divide and conquer) mode. Dynamic Programming is particularly effective for sub-problem overlap situations because it saves the solutions of the sub-problems in a table, and when a solution of a sub-problem is needed, it takes value directly, thus avoiding repeated calculations.
In this embodiment, the unsatisfied sample in the data can be analyzed for the unsatisfied reason in a dynamic canonical algorithm manner, and the sample data which does not directly satisfy the condition but has the reference value in the data is added into the parameter set, so that the method has the advantage of further improving the value of the processed parameter set
In an embodiment of the present application, as shown in fig. 3, the step S26, that is, the dynamic canonical algorithm specifically includes steps S261 to S265, which are specifically as follows:
step S261: randomly calling a sample which does not meet the screening condition;
step S262: splitting a big problem which does not meet the screening condition into a plurality of small problems;
step S263: backward pushing from the last step of the minor problems according to the steps, finding out reasons which do not meet the conditions according to the state function, analyzing imperfect conditions in the screening conditions corresponding to the state function according to the reasons, and repeating the steps to obtain a plurality of unsatisfied samples;
step S264: selecting the optimal solution which can optimize the state function in the unsatisfied samples as the optimal sample according to the screening conditions;
step S265: outputting the optimal sample to add to the set of parameters.
In this embodiment, the state function is disassembled through a dynamic normative algorithm, and the dynamic programming algorithm defines the relationship between the states of the problem by splitting the problem, so that the problem can be solved in a recursion manner. When any sub-problem is solved, various possible local solutions are listed, the local solutions which are possible to reach the optimal are reserved through decision, other local solutions are discarded, the sub-problems are solved in sequence, and the last sub-problem is the solution of the initial problem.
Step S3: and judging whether the parameter set meets the sample number, if so, outputting the parameter set, and otherwise, jumping to the previous step.
In this embodiment, the parameter set is a sample data set.
In summary, the method of the application can rapidly screen out the required sample data set from the original data set without sensitive data by establishing the number of the samples and the state function, and the state function can be optimized by not meeting the samples, so that the data characteristics expressed by the sample data set are highly consistent with the authenticity data characteristics of the original data set, and the method has the advantages of high screening efficiency and original data set authenticity preservation.
Fig. 4 is a block diagram of a big data processing apparatus according to an embodiment of the present application. As shown, the apparatus 400 includes:
an establishing module 401, configured to establish a number of samples according to a preset condition, and establish a state function according to the number of samples;
a processing module 402, configured to filter seed numbers according to the state function, and add the seed numbers meeting the filtering condition to a parameter set; and judging whether the parameter set meets the sample number, if so, outputting the parameter set, and otherwise, jumping to the previous step.
It should be noted that, because the contents of information interaction, execution process, and the like between the modules/units of the apparatus are based on the same concept as the method embodiment described in the present application, the technical effect brought by the contents is the same as the method embodiment of the present application, and specific contents may refer to the description in the foregoing method embodiment of the present application, and are not described herein again.
It should be further noted that the division of the modules of the above apparatus is only a logical division, and the actual implementation may be wholly or partially integrated into one physical entity, or may be physically separated. And these units can be implemented entirely in software, invoked by a processing element; or may be implemented entirely in hardware; and part of the modules can be realized in the form of calling software by the processing element, and part of the modules can be realized in the form of hardware. For example, the processing module 402 may be a separate processing element, or may be integrated into a chip of the apparatus, or may be stored in a memory of the apparatus in the form of program code, and a processing element of the apparatus calls and executes the functions of the processing module 402. Other modules are implemented similarly. In addition, all or part of the modules can be integrated together or can be independently realized. The processing element described herein may be an integrated circuit having signal processing capabilities. In implementation, each step of the above method or each module above may be implemented by an integrated logic circuit of hardware in a processor element or an instruction in the form of software.
For example, the above modules may be one or more integrated circuits configured to implement the above methods, such as: one or more Application Specific Integrated Circuits (ASICs), or one or more microprocessors (DSPs), or one or more Field Programmable Gate Arrays (FPGAs), among others. For another example, when one of the above modules is implemented in the form of a Processing element scheduler code, the Processing element may be a general-purpose processor, such as a Central Processing Unit (CPU) or other processor capable of calling program code. For another example, these modules may be integrated together and implemented in the form of a system-on-a-chip (SOC).
Fig. 5 is a schematic structural diagram of a computer device according to an embodiment of the present application. As shown, the computer device 500 includes: a memory 501 and a processor 502; the memory 501 is used for storing computer instructions; the processor 502 executes computer instructions to implement the method described in fig. 1. .
In some embodiments, the number of the memory 501 in the computer device 500 may be one or more, the number of the processor 502 may be one or more, the number of the communicator 503 may be one or more, and fig. 5 is taken as an example.
In an embodiment of the present application, the processor 502 in the computer device 500 loads one or more instructions corresponding to processes of an application program into the memory 501 according to the steps described in fig. 1, and the processor 502 executes the application program stored in the memory 501, thereby implementing the method described in fig. 1.
The Memory 501 may include a Random Access Memory (RAM), and may also include a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. The memory 501 stores an operating system and operating instructions, executable modules or data structures, or a subset thereof, or an expanded set thereof, wherein the operating instructions may include various operating instructions for implementing various operations. The operating system may include various system programs for implementing various basic services and for handling hardware-based tasks.
The Processor 502 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the device can also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component.
In some specific applications, the various components of the computer device 500 are coupled together by a bus system that may include a power bus, a control bus, a status signal bus, etc., in addition to a data bus. But for clarity of explanation the various busses are shown in fig. 5 as a bus system.
In an embodiment of the present application, a computer-readable storage medium is provided, on which a computer program is stored, which when executed by a processor implements the method described in fig. 1.
The computer-readable storage medium, as will be appreciated by one of ordinary skill in the art: the embodiment for realizing the functions of the system and each unit can be realized by hardware related to computer programs. The aforementioned computer program may be stored in a computer readable storage medium. When the program is executed, the embodiment including the functions of the system and the units is executed; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.
In summary, according to the big data processing method, the big data processing device, the big data processing equipment and the storage medium based on the sensitive data, the number of samples is determined according to the preset conditions, and the state function is determined according to the number of the samples; screening the number of seeds according to the state function, and adding the number of seeds meeting the screening condition into a parameter set; and judging whether the parameter set meets the sample number, if so, outputting the parameter set, and otherwise, jumping to the previous step.
The application effectively overcomes various defects in the prior art and has high industrial utilization value.
The above embodiments are merely illustrative of the principles and utilities of the present application and are not intended to limit the invention. Any person skilled in the art can modify or change the above-described embodiments without departing from the spirit and scope of the present application. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical spirit of the present invention be covered by the claims of the present application.

Claims (10)

1. A big data processing method based on sensitive data is characterized by comprising the following steps:
establishing the number of samples according to a preset condition, and establishing a state function according to the number of the samples;
screening the number of seeds according to the state function, and adding the number of seeds meeting the screening condition into a parameter set;
and judging whether the parameter set meets the sample number, if so, outputting the parameter set, and otherwise, jumping to the previous step.
2. The method of claim 1, wherein the screening the number of seeds according to the state function, and adding the number of seeds satisfying the screening condition to a parameter set comprises:
calling an original data set;
randomly extracting a sample as the seed number, and substituting the seed number into the state function for calculation;
judging whether the evaluation indexes corresponding to various parameter requirements in the screening conditions are met or not; if yes, carrying out the next step, otherwise, skipping to the previous step;
calculating whether the state function meets the requirement, if so, carrying out the next step, otherwise, skipping to the last step;
adding the seed number meeting the requirement into the parameter set;
and disassembling the state function to analyze that the condition is not met, and adding the optimal sample into the parameter set.
3. The method of claim 2, wherein the raw data set is a big data set with sensitive data removed; the parameter set is a sample data set.
4. The method of claim 2, wherein the screening criteria are established based on specific parameter classes in the raw data set.
5. The method according to claim 2, characterized in that the state functions are disassembled by a dynamic specification algorithm.
6. The method of claim 2, wherein said deconstructing said state function to analyze that a condition is not satisfied, adding optimal samples to said parameter set comprises:
randomly calling a sample which does not meet the screening condition;
splitting a big problem which does not meet the screening condition into a plurality of small problems;
backward pushing from the last step of the minor problems according to the steps, finding out reasons which do not meet the conditions according to the state function, analyzing imperfect conditions in the screening conditions corresponding to the state function according to the reasons, and repeating the steps to obtain a plurality of unsatisfied samples;
selecting the optimal solution which can optimize the state function in the unsatisfied samples as the optimal sample according to the screening conditions;
outputting the optimal sample to add to the set of parameters.
7. The method of claim 6, wherein the state function is a screening process established according to the number of samples, and is adjustable in real time according to the unsatisfied samples.
8. A big data processing apparatus, the apparatus comprising:
the establishing module is used for establishing the number of samples according to a preset condition and establishing a state function according to the number of the samples;
the processing module is used for screening the seed number according to the state function and adding the seed number meeting the screening condition into a parameter set; and judging whether the parameter set meets the sample number, if so, outputting the parameter set, and otherwise, jumping to the previous step.
9. A computer device, the device comprising: a memory, and a processor; the memory is to store computer instructions; the processor executes computer instructions to implement the method of any one of claims 1 to 7.
10. A computer-readable storage medium having stored thereon computer instructions which, when executed, perform the method of any one of claims 1 to 7.
CN201910876650.5A 2019-09-17 2019-09-17 Big data processing method, device, equipment and storage medium based on sensitive data Active CN110674373B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910876650.5A CN110674373B (en) 2019-09-17 2019-09-17 Big data processing method, device, equipment and storage medium based on sensitive data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910876650.5A CN110674373B (en) 2019-09-17 2019-09-17 Big data processing method, device, equipment and storage medium based on sensitive data

Publications (2)

Publication Number Publication Date
CN110674373A true CN110674373A (en) 2020-01-10
CN110674373B CN110674373B (en) 2020-08-07

Family

ID=69078047

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910876650.5A Active CN110674373B (en) 2019-09-17 2019-09-17 Big data processing method, device, equipment and storage medium based on sensitive data

Country Status (1)

Country Link
CN (1) CN110674373B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104346445A (en) * 2014-10-28 2015-02-11 浪潮电子信息产业股份有限公司 Method for rapidly screening outlier data from massive data
CN107895168A (en) * 2017-10-13 2018-04-10 平安科技(深圳)有限公司 The method of data processing, the device of data processing and computer-readable recording medium
US20180329951A1 (en) * 2017-05-11 2018-11-15 Futurewei Technologies, Inc. Estimating the number of samples satisfying the query
CN109671507A (en) * 2018-12-24 2019-04-23 万达信息股份有限公司 A kind of obstetrics' disease that calls for specialized treatment coupling index method for digging based on Electronic Health Record
CN109885877A (en) * 2019-01-15 2019-06-14 江苏大学 A kind of constrained domain optimization Latin hypercube design method based on clustering algorithm
CN110059764A (en) * 2019-04-26 2019-07-26 莆田学院 A kind of optimization down-sampling svm classifier method and storage medium based on potential function clustering

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104346445A (en) * 2014-10-28 2015-02-11 浪潮电子信息产业股份有限公司 Method for rapidly screening outlier data from massive data
US20180329951A1 (en) * 2017-05-11 2018-11-15 Futurewei Technologies, Inc. Estimating the number of samples satisfying the query
CN107895168A (en) * 2017-10-13 2018-04-10 平安科技(深圳)有限公司 The method of data processing, the device of data processing and computer-readable recording medium
CN109671507A (en) * 2018-12-24 2019-04-23 万达信息股份有限公司 A kind of obstetrics' disease that calls for specialized treatment coupling index method for digging based on Electronic Health Record
CN109885877A (en) * 2019-01-15 2019-06-14 江苏大学 A kind of constrained domain optimization Latin hypercube design method based on clustering algorithm
CN110059764A (en) * 2019-04-26 2019-07-26 莆田学院 A kind of optimization down-sampling svm classifier method and storage medium based on potential function clustering

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
KEKE GAI ET AL.: ""In-memory big data analytics under space constraints using dynamic programming"", 《FUTURE GENERATION COMPUTER SYSTEMS》 *
李毅 等: ""大数据挖掘的均匀抽样设计及数值分析"", 《统计与信息论坛》 *

Also Published As

Publication number Publication date
CN110674373B (en) 2020-08-07

Similar Documents

Publication Publication Date Title
KR101983206B1 (en) Data records selection
CN113360554B (en) Method and equipment for extracting, converting and loading ETL (extract transform load) data
CN106909452B (en) Parallel program runtime parameter optimization method
US9952577B2 (en) Graph theory and network analytics and diagnostics for process optimization in manufacturing
Almeida et al. Dyno: Dynamic onloading of deep neural networks from cloud to device
CN116126947B (en) Big data analysis method and system applied to enterprise management system
CN112181522A (en) Data processing method and device and electronic equipment
CN113626241A (en) Application program exception handling method, device, equipment and storage medium
CN112799785A (en) Virtual machine cluster migration method, device, equipment and medium
CN116186267A (en) Policy data processing method, device, computer equipment and storage medium
CN111125199A (en) Database access method and device and electronic equipment
CN110674373B (en) Big data processing method, device, equipment and storage medium based on sensitive data
CN115906927B (en) Data access analysis method and system based on artificial intelligence and cloud platform
CN110489965B (en) Implementation method and system of deep threat recognition real-time engine
WO2023105348A1 (en) Accelerating decision tree inferences based on tensor operations
CN115858306A (en) Micro-service monitoring method based on event stream, terminal equipment and storage medium
CN106970837B (en) Information processing method and electronic equipment
US10007681B2 (en) Adaptive sampling via adaptive optimal experimental designs to extract maximum information from large data repositories
CN111079390B (en) Method and device for determining selection state of check box list
CN108134810B (en) Method and system for determining resource scheduling component
De Bonis Group Testing in Arbitrary Hypergraphs and Related Combinatorial Structures
WO2019156894A1 (en) Event table management using type-dependent portions
CN117608862B (en) Data distribution control method, device, equipment and medium
CN112784422B (en) Fine-grained performance modeling method applied to parallel scientific computing program
CN114596011B (en) Enterprise data processing method based on artificial intelligence and related device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant