CN112667608A - Data processing method and device and data processing device - Google Patents

Data processing method and device and data processing device Download PDF

Info

Publication number
CN112667608A
CN112667608A CN202010261498.2A CN202010261498A CN112667608A CN 112667608 A CN112667608 A CN 112667608A CN 202010261498 A CN202010261498 A CN 202010261498A CN 112667608 A CN112667608 A CN 112667608A
Authority
CN
China
Prior art keywords
data
array
index
determining
quantile
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010261498.2A
Other languages
Chinese (zh)
Other versions
CN112667608B (en
Inventor
范晓昱
王国赛
何旭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huakong Tsingjiao Information Technology Beijing Co Ltd
Original Assignee
Huakong Tsingjiao Information Technology Beijing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huakong Tsingjiao Information Technology Beijing Co Ltd filed Critical Huakong Tsingjiao Information Technology Beijing Co Ltd
Priority to CN202010261498.2A priority Critical patent/CN112667608B/en
Publication of CN112667608A publication Critical patent/CN112667608A/en
Application granted granted Critical
Publication of CN112667608B publication Critical patent/CN112667608B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Complex Calculations (AREA)

Abstract

The embodiment of the application provides a data processing method and device and a device for data processing. The method is used for determining the segmentation points of the data to be boxed based on the ciphertext of the data, and comprises the following steps: constructing a data array according to the ciphertext of the data to be boxed; determining the box dividing number and the box dividing mode of the data to be subjected to box dividing; determining quantiles according to the box dividing number and the box dividing mode; and determining a corresponding splitting point of the data to be subjected to box splitting in the box splitting mode based on the quantiles and the data array. According to the data classification method and device, the segmentation point of the data classification can be determined on the basis of the ciphertext, so that the ciphertext data can be classified, and the privacy safety of the data can be guaranteed.

Description

Data processing method and device and data processing device
Technical Field
The present application relates to the field of computer technologies, and in particular, to a data processing method and apparatus, and an apparatus for data processing.
Background
With the advent of the big data age, big data analysis also arose. Under the scenes of big data analysis and the like, operations such as statistics, comparison, analysis and the like are often required to be carried out on certain data. While data may have numerical noise such as random error, abnormal value, extreme value and the like during collection, the speed of the algorithm is affected if the data are directly used, and in addition, part of the algorithm does not support continuous variables, so that the data need to be preprocessed.
Data binning is a commonly used data preprocessing method, and a "bin" is actually a sub-interval divided according to an attribute value of an attribute corresponding to data, such as an age-divided sub-interval, a height-divided sub-interval, and the like. The main purpose of data binning is to denoise, discretize continuous data and increase granularity.
Before data is subjected to box separation, different segmentation points need to be calculated according to different box separation modes. For example, for a certain group of data, the age attribute value is 0-60 years old, and assuming that the group of data needs to be divided into three boxes according to an equidistant box dividing mode, namely three boxes of 0-20, 21-40, and 41-60, the following division points need to be determined: [0,20,40].
The existing data binning method is generally used for binning plaintext data, and for ciphertext data, the required binning points for binning cannot be calculated according to specific numerical values of the data, so that the plaintext of the data needs to be exposed in the data binning processing process, and the privacy and safety of the data are difficult to guarantee.
Disclosure of Invention
The embodiment of the application provides a data processing method and device and a data processing device, and can determine the segmentation point of data binning on the basis of a ciphertext, so that the ciphertext data can be binned, and the privacy safety of the data can be guaranteed.
In order to solve the above problem, an embodiment of the present application discloses a data processing method, where the method is used to determine a slicing point of data to be binned based on a ciphertext of the data, and the method includes:
constructing a data array according to the ciphertext of the data to be boxed;
determining the box dividing number and the box dividing mode of the data to be subjected to box dividing;
determining quantiles according to the box dividing number and the box dividing mode;
and determining a corresponding splitting point of the data to be subjected to box splitting in the box splitting mode based on the quantiles and the data array.
On the other hand, the embodiment of the application discloses a data processing device, the device is used for determining the segmentation point of the data to be boxed based on the ciphertext of the data, and the device comprises:
the data array construction module is used for constructing a data array according to the ciphertext of the data to be boxed;
the box dividing parameter determining module is used for determining the box dividing number and the box dividing mode of the data to be subjected to box dividing;
the quantile determining module is used for determining quantiles according to the box dividing number and the box dividing mode;
and the dividing point determining module is used for determining the dividing point corresponding to the data to be subjected to the box dividing in the box dividing mode based on the quantile and the data array.
In yet another aspect, an embodiment of the present application discloses an apparatus for data processing, the apparatus for determining a slicing point of data to be binned based on ciphertext of the data, comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory, and the one or more programs configured to be executed by the one or more processors include instructions for:
constructing a data array according to the ciphertext of the data to be boxed;
determining the box dividing number and the box dividing mode of the data to be subjected to box dividing;
determining quantiles according to the box dividing number and the box dividing mode;
and determining a corresponding splitting point of the data to be subjected to box splitting in the box splitting mode based on the quantiles and the data array.
In yet another aspect, embodiments of the present application disclose a machine-readable medium having instructions stored thereon, which when executed by one or more processors, cause an apparatus to perform a data processing method as described in one or more of the preceding.
The embodiment of the application has the following advantages:
according to the data dividing method and device, the data array is constructed according to the ciphertext of the data to be divided into the boxes, the quantiles are determined according to the preset box dividing number and the box dividing mode, and then the corresponding dividing points of the data to be divided into the boxes are determined based on the quantiles and the data array. Furthermore, based on the determined dividing points, the data to be subjected to the box dividing can be divided into boxes. Therefore, by the data processing method provided by the embodiment of the application, the segmentation point of the data to be subjected to binning can be determined on the basis of the ciphertext, the binning processing can be performed on the data to be subjected to binning on the basis of the ciphertext according to the preset binning number and the binning mode, the plaintext of the data cannot be exposed in the processing process, and the privacy and the safety of the data can be guaranteed.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the description of the embodiments of the present application will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive exercise.
FIG. 1 is a flow chart of the steps of an embodiment of a data processing method of the present application;
FIG. 2 is a block diagram of an embodiment of a data processing apparatus of the present application;
FIG. 3 is a block diagram of an apparatus 800 for data processing of the present application; and
fig. 4 is a schematic diagram of a server in some embodiments of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Method embodiment
Referring to fig. 1, a flowchart illustrating steps of an embodiment of a data processing method according to the present application is shown, where the method may be used to determine a segmentation point of data to be binned based on a ciphertext of the data, and specifically may include the following steps:
step 101, constructing a data array according to a ciphertext of data to be boxed;
step 102, determining the box dividing number and the box dividing mode of the data to be subjected to box dividing;
103, determining quantiles according to the box dividing number and the box dividing mode;
and 104, determining a corresponding dividing point of the data to be subjected to the box dividing in the box dividing mode based on the quantile and the data array.
The data processing method provided by the embodiment of the application can determine the segmentation points of the data to be subjected to box separation on the basis of the ciphertext, so that the data to be subjected to box separation can be subjected to box separation on the basis of the ciphertext, the plaintext of the data cannot be exposed in the processing process, and the privacy safety of the data can be guaranteed.
The data processing method of the embodiment of the application can be applied to electronic equipment, and the electronic equipment comprises but is not limited to: a server, a smart phone, a recording pen, a tablet computer, an e-book reader, an MP3 (Moving Picture Experts Group Audio Layer III) player, an MP4 (Moving Picture Experts Group Audio Layer IV) player, a laptop, a car computer, a desktop computer, a set-top box, a smart tv, a wearable device, and the like.
Firstly, a data array is constructed according to the ciphertext of the data to be boxed. For example, the data to be binned includes: 16. 7,11,6,6,8,8,13,14, and 11, encrypting the data to be binned to obtain a ciphertext of the data to be binned, and constructing a data array as a ═ 16,7,11,6,6,8,8,13,14, and 11 according to the ciphertext of the data to be binned. It should be noted that all data elements in the data array are ciphertext, and the embodiments of the present application are shown in plaintext for convenience of description.
In a specific application, the number of bins and the binning mode of data to be binned can be preset. And the number of the boxes is the number of the boxes which need to be divided of the data to be subjected to the box division and are specified by the user. The binning mode is a specific way for binning data to be binned, and may include at least one of equidistant binning, equal-frequency binning, and user-defined binning.
The equidistant binning is to divide the data to be binned into N bins according to the minimum value and the maximum value in the data to be binned, and the width values of the bins are equal. The equal frequency binning is to divide data to be binned into M bins, and each bin contains equal data proportion. The user-defined frequency dividing and binning is to divide the binning data according to a user-defined segmentation rule, wherein the user-defined segmentation rule can comprise a user-defined segmentation proportion or a user-defined quantile.
Of course, in practical implementation, the binning mode may also include other modes not listed above. For convenience of description, in the embodiments of the present application, two binning modes, namely equal-frequency binning and custom-frequency binning, are mainly used as examples for description, and scenes of other binning modes may be referred to each other.
In an optional embodiment of the present application, the determining the quantile according to the binning number and the binning mode in step 103 may specifically include:
if the binning mode is equal-frequency binning, determining a first cut-rate proportion according to the binning number, and determining quantiles based on the first cut-rate proportion; or
And if the binning mode is user-defined frequency dividing binning, determining the quantile according to a user-defined second segmentation proportion.
Before the data to be subjected to binning is subjected to binning, the quantile can be determined according to the preset binning number and the binning mode. The quantile number refers to a point in the continuous distribution function, which corresponds to a probability p, and if the probability 0< p <1, the quantile Za of the random variable X or its probability distribution refers to a real number satisfying the condition p (X ≦ Za) ═ α. In this embodiment of the application, the quantile may be determined according to a slicing proportion, which represents a proportion of data contained in the corresponding bin.
In the equal-frequency binning mode, a first segmentation proportion can be determined according to a preset binning number, and the quantile is determined based on the first segmentation proportion. For example, if the preset number of bins is 4, the data to be binned needs to be binned into 4 bins, and each bin contains data in equal proportion. Thus, a first cut ratio may be determined, for example, as follows: 0%, 25%, 50%, 75%, i.e. each bin contains 25% of data. From this first cut proportion, the following quantiles can be determined: 0. 0.25, 0.5, 0.75. If the preset number of the bins is 5, the data to be binned needs to be binned into 5 bins, and each bin contains data in equal proportion. Thus, a first cut ratio may be determined, for example, as follows: 0%, 20%, 40%, 60%, 80%, i.e. each bin contains 20% of data. From this first cut proportion, the following quantiles can be determined: 0. 0.2, 0.4, 0.6, 0.8.
And under the mode of customizing the frequency dividing and binning, determining the quantile according to the customized second segmentation proportion. For example, the customized second slicing ratio is as follows: 10%, 20%, 50%, according to the second slicing ratio, the following quantiles can be determined: 0.1,0.2, 0.5. As another example, in the mode of customizing the frequency dividing and binning, the quantile can also be customized, for example, the customized quantile includes: 0.1,0.2, 1.0.
Optionally, after determining the quantile according to the binning number and the binning mode in step 103, the method may further include: and carrying out validity check on the determined quantiles.
The validity check may include checking whether a value range of the determined quantile is in accordance with a preset range. For example, in the embodiment of the present application, the quantiles are determined according to the first slicing proportion of the equal-frequency binning or the second slicing proportion of the custom-frequency binning, and therefore, the value range of each quantile should be within [0,1 ]. That is, the value range of the quantile can be set to [0,1 ].
After the quantiles are determined according to the box dividing number and the box dividing mode, whether the value range of the determined quantiles accords with a preset range or not can be judged so as to check the legality of the quantiles. For example, after the quantiles are determined, whether each quantile is in the value range of [0,1] can be judged, if the quantile is smaller than 0 or larger than 1, the quantile is not legal, and the illegal quantile can be corrected.
It should be noted that the quantiles are plaintext, and the specific form of the quantile is not limited in the embodiments of the present application, for example, each quantile may be represented by a single number, or all quantiles may be represented by a data structure such as a list or an array.
In an optional embodiment of the present application, the dimension of the data array may be greater than or equal to 1, and in a case that the dimension of the data array is greater than 1, the cut point of the data to be binned is determined based on one of the dimensions.
In specific application, a 1-dimensional data array can be constructed according to the ciphertext of the data to be subjected to binning, and the corresponding splitting point of the data to be subjected to binning in the 1-dimensional data array in the binning mode is determined.
In addition, a data array with dimension larger than 1, for example, a multidimensional data array with 2 dimensions, 3 dimensions and the like, can be constructed according to the ciphertext of the data to be boxed.
Optionally, different dimensions of the multidimensional data array correspond to different attribute values of the data to be binned.
In one example, a multidimensional data array may be constructed from different attribute values of data to be binned, each dimension representing a different attribute value. For example, a 2-dimensional data array is constructed according to the age attribute value and the income attribute value of the data to be classified, wherein the 1 st dimension (such as a row) in the 2-dimensional data array is the age attribute value of the data to be classified, and the 2 nd dimension (such as a column) is the income attribute value of the data to be classified.
When the dimension of the data array is greater than 1, according to the data processing method of the embodiment of the application, the splitting point of the data to be binned is determined based on one dimension. For example, for the above 2-dimensional data array, the cut point corresponding to the age attribute value of the dimension may be determined based on the 1 st dimension first, and then the cut point corresponding to the income attribute value of the dimension may be determined based on the 2 nd dimension.
Optionally, in a case that the dimension of the data array is greater than 1, the method may further include:
and carrying out validity check on the current dimension based on the current box separation operation.
For example, a 2-dimensional data array is constructed according to the ciphertext of the data to be binned, and the parameter axis is used to represent the current dimension on which the current binning is based, so that the legal value of axis includes 0 and 1. For another example, a 3-dimensional data array is constructed according to the ciphertext of the data to be boxed, and the legal values of axis include 0,1 and 2. According to the method and the device, the legality of the current dimensionality based on the current box separation operation is checked, so that the situation that the input current dimensionality exceeds a legal value range to cause errors in a calculation result is avoided.
After determining the quantiles according to the preset binning number and the binning mode, step 104 may be executed to determine corresponding splitting points of the data to be binned in the binning mode based on the quantiles and the data array.
In an optional embodiment of the present application, the determining, based on the quantile and the data array in step 104, a corresponding cut point of the data to be binned in the binning mode may specifically include:
s11, constructing a quantile index array according to the quantile;
and step S12, determining a corresponding splitting point of the data to be split in the splitting mode based on the quantile index array and the data array.
According to the determined quantiles, a quantile index array is constructed, the data elements in the quantile index array are index values, and the corresponding data elements can be read from the data array by using the index values.
In one example of the present application, it is assumed that a data array formed by ciphertext of data to be binned is [16,7,11,6,6,8,8,13,14,11], the binning mode is a custom binning, and the custom quantiles are as follows: when the quantile array s is [0.1,0.2,1.0], it is necessary to determine the division point of the data array a [16,7,11,6,6,8,8,13,14,11] according to the quantile array s [0.1,0.2,1.0 ].
For a quantile s ═ 0.1,0.2,1.0], embodiments of the present application determine the index values of the data elements in the data array, respectively, for each quantile. For example, first determining that the digit s [0] is 0.1, and adding the determined index value to the index array indices of the data elements in the data array; then determining the index value of the data element in the data array corresponding to the quantile s [1] ═ 0.2, and adding the determined index value into the quantile index array indices; and finally, determining the index value of the data element in the corresponding data array with the quantile s [2] being 1.0, and adding the determined index value into the quantile index array indices, so as to obtain the quantile index array indices constructed according to the quantile s being [0.1,0.2,1.0 ].
In an optional embodiment of the present application, the constructing a quantile index array according to the quantile in step S11 specifically may include:
s111, determining the number n of the data to be subjected to binning and the number m of quantiles;
step S112, determining an index value corresponding to the ith quantile q (i) according to the following formula: q (i) x (n-1), wherein i is [1, m ], and i is an integer;
and S113, constructing a quantile index array according to the index values respectively corresponding to the m quantiles.
It should be noted that, when the dimension of the data array is greater than 1, the number n of the data to be binned refers to the number of data in the dimension on which the segmentation point calculation is currently performed.
In the embodiments of the present application, a 1-dimensional data array is taken as an example for explanation. In the above example, the data array a is [16,7,11,6,6,8,8,13,14,11], the number n of data to be binned is 10, the quantile s is [0.1,0.2,1.0], and the number m of quantiles is 3. The index value corresponding to the ith quantile q (i) may be determined according to:
q(i)×(n-1) (1)
wherein, the value of i is [1, m ], and i is an integer.
Specifically, for the 1 st quantile q (1) ═ s [0] ═ 0.1, the index value corresponding to the quantile 0.1 can be determined by the above formula (1) as: 0.1 × (10-1) ═ 0.9. For the 2 nd quantile q (2) ═ s [1] ═ 0.2, the index value corresponding to the quantile 0.2 can be calculated by the above formula (1) as: 0.2 × (10-1) ═ 0.2. For the 3 rd quantile q (3) ═ s [2] ═ 1.0, the index value corresponding to the quantile 1.0 can be calculated by the above formula (1) as: 1.0 × (10-1) ═ 9.
And constructing a quantile index array according to the index values corresponding to the 3 quantiles, wherein the indexes are [0.9,1.8 and 9 ]. According to the index value in the quantile index array, the corresponding data element in the data array can be determined, and then the corresponding dividing point of the data to be divided in the dividing mode can be obtained.
Optionally, in step S12, based on the quantile index array and the data array, determining a corresponding splitting point of the data to be binned in the binning mode, which may specifically include:
and reading data elements at corresponding positions from the quantile index array according to the index values in the quantile index array, and taking the read data elements as the determined dividing points.
However, in particular applications, the index values in the quantile index array may not be the actual index values that exist. For example, in the above example, the index array of fractional bits is [0.9,1.8,9], where index [0] is 0.9, and index [1] is 1.8, since the index values in the array should be integer values, the two index values of 0.9 and 1.8 are not the true index values, and the corresponding data elements cannot be read from the data array according to the two index values of 0.9 and 1.8.
In order to solve the above problem, in the embodiments of the present application, a linear weighting method is used to determine corresponding data elements in a data array according to index values in a quantile index array. The linear weighting means that data to be subjected to box separation in the data array are sorted based on a ciphertext to obtain an ordered array; and determining corresponding weight for each index value in the quantile index array, and performing weighted calculation on the ordered array by using the determined weight to realize the process of reading corresponding data elements in the data array by using the index values in the quantile index array. In an optional embodiment of the present application, the step S12 determines, based on the quantile index array and the data array, a corresponding cut point of the to-be-binned data in the binning mode, which may specifically include:
step S121, respectively constructing an upper-bound index array and a lower-bound index array according to index values in the quantile index array;
step S122, determining an upper bound weight array according to the upper bound index array and the quantile index array, and determining a lower bound weight array according to the upper bound weight array;
s123, sequencing the data array based on the ciphertext of the data to be subjected to box separation to obtain an ordered array;
step S124, obtaining a value from the ordered array according to the upper bound index array to obtain an upper bound data array, and obtaining a value from the ordered array according to the lower bound index array to obtain a lower bound data array;
and S125, performing linear weighted calculation on the upper bound data array and the lower bound data array according to the upper bound weight array and the lower bound weight array to obtain a result array containing the determined dividing points.
In the above example, for indices [1] ═ 1.8, the corresponding data element cannot be read directly from the data array according to the index value of 1.8. However, index value 1.8 is between index value 1 and index value 2, index value 1 may be considered a lower bound value for index value 1.8, and index value 2 may be considered an upper bound value for index value 1.8. Therefore, the embodiment of the application can utilize the upper bound value and the lower bound value corresponding to the index value in the quantile index array to simulate the value of the index value.
Optionally, the weights in the linear weighting calculation may include an upper bound weight and a lower bound weight.
Specifically, after the quantile index array is constructed, an upper bound index array and a lower bound index array may be constructed, where the upper bound index array includes upper bound values of index values in the quantile index array, and the lower bound index array includes lower bound values of index values in the quantile index array. And then, respectively determining the upper bound weight corresponding to each index value in the quantile index array, constructing an upper bound weight array, determining the lower bound weight corresponding to each index value in the quantile index array, and constructing a lower bound weight array.
And then, based on the ciphertext of the data to be subjected to box separation, sequencing the data array to obtain an ordered array. And obtaining an upper bound data array by taking values from the ordered array according to the upper bound index array, and obtaining a lower bound data array by taking values from the ordered array according to the lower bound index array. It is to be understood that the execution order of the steps of sorting the data array is not limited thereto, and the steps can be executed in any step before the values are taken by using the upper-bound index array and the lower-bound index array. For example, the data arrays may be sorted first, and then the upper and lower bound index arrays may be constructed. It should be noted that, in the embodiment of the present application, after the data array is sorted to obtain the ordered array, the value is taken from the ordered array by using the upper-bound index array and the lower-bound index array.
And finally, performing linear weighting calculation on the upper bound data array and the lower bound data array according to the upper bound weight array and the lower bound weight array, namely determining data elements corresponding to index values in the quantile index array in the data array to obtain a determined result array, wherein the data elements in the result array are the determined cut points.
In an optional embodiment of the present application, in step S121, respectively constructing an upper bound index array and a lower bound index array according to the index values in the quantile index array may specifically include:
step S21, determining an upper bound value and a lower bound value corresponding to the index value in the quantile index array;
and S22, constructing an upper bound index array according to the upper bound value corresponding to the index value in the quantile index array, and constructing a lower bound index array according to the lower bound value corresponding to the index value in the quantile index array.
In the above example, the quantile index array is indices ═ 0.9,1.8,9], and the upper bound value and the lower bound value corresponding to each index value in the quantile index array are respectively determined. For example, the upper bound value of index value 0.9 in array indices is determined to be 1, and the lower bound value is determined to be 0. The upper bound of index value 1.8 is determined to be 2 and the lower bound is determined to be 1. The upper bound value of index value 9 is determined to be 9 and the lower bound value is determined to be 9.
Thus, an upper bound index array may be constructed as indices _ above ═ 1,2,9 according to the upper bound values 1,2,9, and an index array may be constructed as indices _ below ═ 0,1,9 according to the lower bound values 0,1, 9.
In an optional embodiment of the present application, the determining, in step S21, an upper bound value and a lower bound value corresponding to an index value in the quantile index array may specifically include:
judging whether a current index value in the quantile index array corresponds to a last data element in the data array, if so, determining an upper bound value and a lower bound value corresponding to the current index value as the current index value; otherwise, rounding the current index value downwards to obtain a lower bound value corresponding to the current index value, and adding 1 to the lower bound value corresponding to the current index value to obtain an upper bound value corresponding to the current index value.
In the above example, since the number of data to be binned is 10 when the data array a is [16,7,11,6,6,8,8,13,14,11], the index value of the data array a is 0 to 9, and the index value 9 in the array indices [0.9,1.8,9] corresponds to the last data element in the data array a. Therefore, the index value 9 may be set as its upper and lower bounds without calculating the upper and lower bounds. For other index values, corresponding upper and lower bound values need to be computed.
Specifically, rounding down the current index value may obtain a lower bound value corresponding to the current index value, and adding 1 to the lower bound value corresponding to the current index value may obtain an upper bound value corresponding to the current index value. For example, for an index value of 0.9, its lower bound is rounded down to 0.9, resulting in 0, and its upper bound is 1 added to its lower bound, i.e., the upper bound of the index value of 0.9 is 0+1 ═ 1. Similarly, for index value 1.8, the lower bound is rounded down to 1.8 to obtain 1, and the upper bound of index value 1.8 is 1 added to the lower bound, i.e., the upper bound of index value 1.8 is 1+1 ═ 2.
Note that if the current index value is an integer, for example, the current index value is 3, the value obtained by rounding down the current index value is 3, so the lower bound of the current index value 3 is 3, and the upper bound is 3+1 — 4.
After determining the upper bound value and the lower bound value corresponding to the index value in the quantile index array, an upper bound index array may be constructed according to the upper bound value, and a lower bound index array may be constructed according to the lower bound value. Next, in the embodiment of the present application, a linear weighting method is adopted to take values from the data array according to the upper-bound index array and the lower-bound index array. Therefore, it is necessary to determine the upper bound weight and the lower bound weight corresponding to each index value in the quantile index array, and further construct the upper bound weight array and the lower bound weight array.
In an optional embodiment of the present application, the step S122 determines an upper bound weight array according to the upper bound index array and the quantile index array, and determines a lower bound weight array according to the upper bound weight array, which may specifically include:
step S31, determining an upper bound weight corresponding to the index value according to the distance between the upper bound value in the upper bound index array and the index value in the quantile index array;
s32, constructing an upper bound weight array according to the upper bound weight corresponding to each index value in the quantile index array;
and step S33, determining a lower bound weight array according to the difference between the array formed by the numerical values 1 and the upper bound weight array. The array formed by the numerical values 1 is the same as the row and column number of the upper bound weight array. The differences are subtracted in bits.
Preferably, the upper bound weight is determined according to the distance between the index value in the quantile index array and the upper bound value of the index value, and the lower bound weight is determined according to the distance between the index value in the quantile index array and the lower bound value of the index value. In the above example, the index array index of quantiles is [0.9,1.8,9], and for index [0] being 0.9, the upper bound of the index value 0.9 is 1, and the lower bound is 0. Therefore, the distance between the index value 0.9 and the upper bound value is 1-0.9 ═ 0.1, and 0.1 can be taken as the upper bound weight for the index value 0.9. Similarly, the distance between the index value 0.9 and the lower bound value is 0.9-0 — 0.9, and 0.9 may be used as the lower bound weight of the index value 0.9.
In specific implementation, since the sum of the upper bound weight and the lower bound weight is 1, the upper bound weight corresponding to the index value in the quantile index array may be determined first, and then the lower bound weight corresponding to the index value may be obtained by subtracting the upper bound weight from 1. Alternatively, the lower bound weight may be determined first, and then the lower bound weight may be subtracted from 1 to obtain the upper bound weight.
In one example, if the index values in the quantile index array indices are integers, for example, the index value is 3, the upper bound weight corresponding to the index value 3 is the distance between the index value 3 and the upper bound value, that is, 4-3 ═ 1, and the upper bound weight corresponding to the index value 3 is 1, so the lower bound weight corresponding to the index value 3 is 1-1 ═ 0.
By adopting the method, for each index value in the quantile index array indices, the corresponding upper bound weight and lower bound weight can be calculated, and then an upper bound weight array can be constructed according to the upper bound weight of each index value and is marked as weight _ above. In the above example, weight _ above-indices is [0.1,0.2,0 ]. And constructing a lower bound weight array according to the lower bound weight of each index value in the quantile index array indices, wherein the lower bound weight array is marked as weight _ below, and in the above example, weight _ below is 1-weight _ above is [1,1,1] - [0.1,0.2,0] - [0.9,0.8,1 ].
After an upper bound weight array and a lower bound weight array are determined, values are taken from the ordered array according to the upper bound index array to obtain an upper bound data array, and values are taken from the ordered array according to the lower bound index array to obtain a lower bound data array.
For example, in the above example, the data array a is [16,7,11,6,6,8,8,13,14,11] is sorted based on the cipher text, resulting in the ordered array a _ p being [6,6,7,8,8,11,11,13,14,16 ]. The data elements in the data array a and the ordered array a _ p are both ciphertexts. The method for sorting the ciphertext arrays is not limited in the present invention.
After the ordered array is obtained, values may be taken from the ordered array a _ p according to the upper bound index array indices _ above ═ 1,2,9, so as to obtain upper bound data. Specifically, indices _ above [0] ═ 1, and a data element with an index value of 1 is read from the ordered array a _ p, that is, a _ p [1] ═ 6 is read from the ordered array a _ p. Similarly, index _ above [1] ═ 2, and the data element with the index value of 2 is read from the ordered array a _ p, that is, a _ p [2] ═ 7 is read from the ordered array a _ p. The indices _ above [2] ═ 9, and the data element with the index value of 9 is read from the ordered array a _ p, that is, the data a _ p [9] ═ 16 is read from the ordered array a _ p.
Thus, the upper-bound data includes 6,7, and 16, and is expressed as an upper-bound data array x1 ═ 6,7, and 16. Similarly, taking a value from the ordered array a _ p according to the lower bound index array indices _ below ═ 0,1,9, may obtain lower bound data including: 6. 6,16, and is marked as the lower bound data array x2 ═ 6,6, 16. The data elements in the upper-bound data array and the lower-bound data array are both ciphertexts.
Since index values in the quantile index array indices ═ 0.9,1.8, and 9 may not exist in the data array, in the embodiment of the present application, the upper bound weight array and the lower bound weight array are used to perform linear weighting calculation on the upper bound data array and the lower bound data array, so as to obtain a result array, where the result array includes the determined cut point.
For example, because the index value of an array can only be an integer and a data element with an index value of 1.8 cannot be read in the ordered array a _ p, the present embodiment uses two integers, namely an upper bound value and a lower bound value of the index value of 1.8, to simulate and calculate the corresponding data element with the index value of 1.8 in the ordered array a _ p. The specific calculation process is as follows:
(indices[i]-k1)×a_p[k1]+(k2-indices[i])×a_p[k2] (2)
where k1 is the lower bound of index value indices [ i ], and k2 is the upper bound of index value indices [ i ]. (indices [ i ] -k1) is the lower bound weight of index value indices [ i ], and (k2-indices [ i ]) is the upper bound weight of index value indices [ i ].
For example, for the index array index of quantiles ═ 0.9,1.8,9, index [1] ═ 1.8, that is, i ═ 1, k1 ═ 1, and k2 ═ 2, the above equation (2) is substituted to obtain:
(1.8-1) × a _ p [1] + (2-1.8) × a _ p [2], where (1.8-1) ═ 0.8 is the lower bound weight with index value 1.8, a _ p [1] is the data element in ordered array a _ p with index value 1, (2-1.8) ═ 0.2 is the upper bound weight with index value 1.8, and a _ p [2] is the data element in ordered array a _ p with index value 2. The formula performs linear weighted calculation on the data elements a _ p [1] and the data elements a _ p [2] in the ordered array a _ p by using the upper bound weight and the lower bound weight of the index value 1.8, and the obtained result can represent the corresponding data elements of the index value 1.8 in the ordered array a _ p.
Similarly, for indices [0] being 0.9, the index value 0.9 corresponds to an upper bound weight of 0.1 and a lower bound weight of 0.9, and thus, an index value 0.9 corresponds to a data element in the ordered array a _ p, which can be obtained by the following equation: a _ p [0] x 0.9+ a _ p [1] x 0.1.
Based on the same principle of the above formula (2), linear weighting calculation can be performed on the upper bound data array and the lower bound data array according to the upper bound weight array and the lower bound weight array, a data element corresponding to each index value in the quantile index array indices in the ordered array a _ p is determined, a result array is obtained, and each data element in the result array is the determined cut point.
Specifically, the result array res may be determined by:
res=x1×weight_above+x2×weight_below (3)
wherein x1 is an upper-bound data array composed of upper-bound data, x2 is a lower-bound data array composed of lower-bound data, weight _ above is an upper-bound weight array, and weight _ below is a lower-bound weight array.
In the above example, x1 ═ 6,7,16, x2 ═ 6,6,16, and are all ciphertexts, weight _ above ═ 0.1,0.2,0, weight _ below ═ 0.9,0.8,1, then the result array calculated according to equation (3) above is:
res is x1 × weight _ above + x2 × weight _ below is [6,7,16] × [0.1,0.2,0] + [6,6,16] × [0.9,0.8,1] is [6,6.8,16 ]. That is, the data array a is binned according to the customized quantile [0.1,0.2,1.0] to [16,7,11,6,6,8,8,13,14,11], and the sectioning point can be determined as follows: [6,6.8,16].
It is understood that the cut points in the result array res are ciphertext, and the embodiment of the present application is shown in plaintext for convenience of description. According to the result array res, the data to be binned can be binned. For example, each data in the data to be binned and the segmentation point in the result array are sequentially subjected to ciphertext-based comparison operation, and the bin into which the current data is to be binned is determined. Therefore, by the data processing method provided by the embodiment of the application, the segmentation point of the data to be subjected to box separation can be determined on the basis of the ciphertext, so that the box separation processing of the data to be subjected to box separation can be realized on the basis of the ciphertext, the plaintext of the data cannot be exposed in the processing process, and the privacy and the safety of the data can be ensured.
It should be noted that, in the above examples, the binning mode is taken as a self-defined frequency dividing binning as an example, in a specific implementation, the binning mode may also be equal frequency binning, and a processing procedure is similar to that of the self-defined frequency dividing binning, and is not described here again.
Under the condition that the binning mode is equal-distance binning, assuming that the number of bins is N, namely, the data to be binned is divided into N bins, and the width values of the bins are equal. Therefore, when equally binning, the width value of each bin can be used as a quantile, and based on the width value and the data array, a corresponding splitting point of the data to be binned in an equally binning mode can be determined.
In an optional embodiment of the present application, the binning mode is equidistant binning, and the method may further comprise: determining the maximum value and the minimum value in the data array based on the ciphertext of the data to be boxed;
the determining the quantiles according to the bin counts and the bin modes comprises:
and determining the corresponding quantile of the data to be subjected to box separation in an equidistant box separation mode according to the maximum value, the minimum value and the box separation number.
And (3) equidistant segmentation, namely dividing the data to be subjected to box separation into N boxes according to the minimum value and the maximum value of the data to be subjected to box separation, wherein the width values of the boxes are equal. Therefore, first, based on the ciphertext of the data to be binned, the maximum value and the minimum value in the data array are determined.
Optionally, the determining a maximum value and a minimum value in the data array based on the ciphertext of the data to be binned may specifically include:
step S41, if the number of elements of the data array is even, equally dividing the data array into two sub-arrays, and if the number of elements of the data array is odd, equally dividing the remaining elements into two sub-arrays after taking out the last element in the data array as a target element;
step S42, carrying out comparison operation based on ciphertext on the two sub-arrays according to elements, and generating a new array according to a comparison result, wherein the new array comprises a larger value in the comparison result, or the new array comprises a smaller value in the comparison result;
step S43, if the number of elements of the data array is even, dividing the new array into two sub-arrays equally, if the number of elements of the data array is odd, adding the target element into the new array, dividing the updated new array into two sub-arrays equally, and executing comparison operation based on the ciphertext on the two sub-arrays obtained by dividing the new array equally according to the elements until the length of the generated new array is 1, so as to obtain the maximum value and the minimum value in the data array.
In an example of the present application, assuming that the data array is [7,3,1,5,8,6,10], the data elements in the data array are all ciphertext, and the process of determining the maximum value in the data array may be as follows: the data array is first divided into two equal-length sub-arrays a _ arr ═ 7,1,8, b _ arr ═ 3,5,6 (since the data array is odd in length, the last value is taken out as the target element). The ciphertext-based comparison operation is then performed on the two sub-arrays by element, and the larger value in the comparison result is retained, which may be achieved by calculating tmp _ arr ═ relu (b _ arr-a _ arr) + a _ arr, and the relu function is used to retain the larger value in the comparison result. Specifically, a _ arr [0] is compared to b _ arr [0] based on the ciphertext, leaving the larger of the comparison results to be 7. A _ arr [1] is compared with b _ arr [1] based on the ciphertext, keeping the larger value in the comparison result to be 5. A _ arr [2] is compared with b _ arr [2] based on the ciphertext, keeping the larger of the comparison results at 8. The new array was obtained as [7,5,8 ]. During calculation of the relu function, if an input value is smaller than 0, the value is replaced by 0, and if the input value is larger than 0, the original value is reserved.
Next, the target element extracted in the first step is added to the new array, and the updated new array is [7,5,8,10 ]. And continuously dividing the updated new array into two sub-arrays according to the method, performing comparison operation based on the ciphertext on the two sub-arrays obtained by dividing the new array equally according to elements, keeping a larger value in a comparison result until the length of the generated new array is 1, exiting the cycle, and obtaining the maximum value in the data array [7,3,1,5,8,6,10 ].
The calculation process of the minimum value is similar to that of the maximum value, except that when the minimum value is calculated, the relu function is used for keeping the smaller value in the comparison result, and the formula when the maximum value is calculated is replaced by: tmp _ arr-arr _ a-relu (arr _ a-arr _ b), and the minimum value of the corresponding position is sequentially taken out.
After the maximum value and the minimum value in the data array are determined, the corresponding quantiles of the data to be subjected to box separation in an equidistant box separation mode are determined according to the maximum value, the minimum value and the box separation number.
Alternatively, in the case of equidistant binning, the quantile may be represented by a width value per bin. Specifically, the width value W may be calculated by the following formula:
W=(B-A)/N (4)
wherein, B is the maximum value in the data array, A is the minimum value in the data array, and N is the number of the sub-boxes.
In an equidistant binning mode, determining a corresponding splitting point of the data to be binned in the binning mode based on the quantile and the data array may specifically include: and determining the corresponding splitting point of the data to be split in the equidistant splitting mode based on the width value of the equidistant splitting and the data array.
Specifically, the width value of equidistant binning may be determined according to a ratio of a difference between the maximum value and the minimum value to the binning number; and then according to the maximum value, the minimum value and the width value, determining a corresponding splitting point of the data to be split in the equidistant splitting mode.
Specifically, the width value W of equidistant bins when the number of bins is N can be calculated according to the above equation (4). Then, under the condition that the minimum value of the data array is A and the maximum value is B, the splitting points of the data to be binned are A, A + W, A +2W, … and A + (N-1) W in sequence.
According to the determined segmentation points in the equidistant binning mode, equidistant binning can be performed on data to be binned, and comparison operation based on the ciphertext is sequentially performed on each data in the data to be binned and the determined segmentation points so as to determine into which bin the current data is to be binned. Therefore, by the data processing method provided by the embodiment of the application, the segmentation points corresponding to the data to be subjected to box separation in different box separation modes can be determined on the basis of the ciphertext, so that the box separation processing of the data to be subjected to box separation according to the preset box separation number and the box separation mode can be realized on the basis of the ciphertext, the plaintext of the data cannot be exposed in the processing process, and the privacy and the safety of the data can be ensured.
It should be noted that, for simplicity of description, the method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the embodiments are not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the embodiments. Further, those skilled in the art will also appreciate that the embodiments described in the specification are presently preferred and that no particular act is required of the embodiments of the application.
Device embodiment
Referring to fig. 2, a block diagram of an embodiment of a data processing apparatus according to the present application is shown, where the apparatus may be configured to determine a slicing point of data to be binned based on a ciphertext of the data, and the apparatus may specifically include:
the data array construction module 201 is used for constructing a data array according to the ciphertext of the data to be boxed;
a binning parameter determining module 202, configured to determine a binning number and a binning mode of the data to be binned;
a quantile determining module 203, configured to determine a quantile according to the binning number and the binning mode;
and a dividing point determining module 204, configured to determine, based on the quantile and the data array, a dividing point corresponding to the data to be subjected to binning in the binning mode.
Optionally, the quantile determining module 203 may include:
the first determining module is used for determining a first cut proportion according to the box dividing number if the box dividing mode is equal-frequency box dividing, and determining quantiles based on the first cut proportion; or
And the second determining module is used for determining the quantile according to a self-defined second segmentation proportion if the binning mode is self-defined frequency dividing binning.
Optionally, the dividing point determining module 204 may include:
the index construction submodule is used for constructing a quantile index array according to the quantile;
and the dividing point determining submodule is used for determining the dividing point corresponding to the data to be subjected to the box dividing in the box dividing mode based on the quantile index array and the data array.
Optionally, the index building sub-module may include:
the number determining unit is used for determining the number n of the data to be subjected to binning and the number m of the quantiles;
an index determining unit, configured to determine an index value corresponding to an ith quantile q (i) according to the following formula: q (i) x (n-1), wherein i is [1, m ], and i is an integer;
and the first construction unit is used for constructing a quantile index array according to the index values respectively corresponding to the m quantiles.
Optionally, the segmentation point determination sub-module may include:
the second construction unit is used for respectively constructing an upper-bound index array and a lower-bound index array according to the index values in the quantile index array;
the weight determining unit is used for determining an upper bound weight array according to the upper bound index array and the quantile index array, and determining a lower bound weight array according to the upper bound weight array;
the data sorting unit is used for sorting the data array based on the ciphertext of the data to be boxed to obtain an ordered array;
the data reading unit is used for obtaining a value from the ordered array according to the upper bound index array to obtain an upper bound data array and obtaining a value from the ordered array according to the lower bound index array to obtain a lower bound data array;
and the dividing point determining unit is used for performing linear weighting calculation on the upper bound data array and the lower bound data array according to the upper bound weight array and the lower bound weight array to obtain a result array containing the determined dividing points.
Optionally, the second building unit may include:
an upper and lower bound determining subunit, configured to determine an upper bound value and a lower bound value corresponding to the index value in the quantile index array;
and the index constructing subunit is used for constructing an upper bound index array according to an upper bound value corresponding to the index value in the quantile index array, and constructing a lower bound index array according to a lower bound value corresponding to the index value in the quantile index array.
Optionally, the upper and lower bound determining subunit is specifically configured to:
judging whether a current index value in the quantile index array corresponds to a last data element in the data array, if so, determining an upper bound value and a lower bound value corresponding to the current index value as the current index value; otherwise, rounding the current index value downwards to obtain a lower bound value corresponding to the current index value, and adding 1 to the lower bound value corresponding to the current index value to obtain an upper bound value corresponding to the current index value.
Optionally, the weight determining unit may specifically include:
the upper bound weight determining subunit is used for determining the upper bound weight corresponding to the index value according to the distance between the upper bound value in the upper bound index array and the index value in the quantile index array;
the upper bound weight array determining subunit is used for constructing an upper bound weight array according to the upper bound weight corresponding to each index value in the quantile index array;
and the lower bound weight array determining subunit is used for determining the lower bound weight array according to the difference value between the array formed by the numerical values 1 and the upper bound weight array.
Optionally, the quantile has a value range of [0,1 ].
Optionally, the dimension of the data array is greater than or equal to 1, and when the dimension of the data array is greater than 1, the splitting point of the data to be binned is determined based on one of the dimensions.
The data processing device that this application embodiment provided can confirm the point of tangency of treating the data of branch case on the cryptograph basis, and then can realize treating the data of branch case and carry out the branch case processing on the cryptograph basis, and can not expose the plaintext of data in the processing procedure, can guarantee the privacy safety of data.
For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.
The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.
With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.
The embodiment of the application provides a device for data processing, the device is used for determining a segmentation point of data to be binned based on ciphertext of the data, and comprises a memory and one or more programs, wherein the one or more programs are stored in the memory, and the one or more programs are configured to be executed by one or more processors and comprise instructions for: constructing a data array according to the ciphertext of the data to be boxed; determining the box dividing number and the box dividing mode of the data to be subjected to box dividing; determining quantiles according to the box dividing number and the box dividing mode; and determining a corresponding splitting point of the data to be subjected to box splitting in the box splitting mode based on the quantiles and the data array.
Fig. 3 is a block diagram illustrating an apparatus 800 for data processing in accordance with an example embodiment. For example, the apparatus 800 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.
Referring to fig. 3, the apparatus 800 may include one or more of the following components: processing component 802, memory 804, power component 806, multimedia component 808, audio component 810, input/output (I/O) interface 812, sensor component 814, and communication component 816.
The processing component 802 generally controls overall operation of the device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing elements 802 may include one or more processors 820 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interaction between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.
The memory 804 is configured to store various types of data to support operation at the device 800. Examples of such data include instructions for any application or method operating on device 800, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.
Power components 806 provide power to the various components of device 800. The power components 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the apparatus 800.
The multimedia component 808 includes a screen that provides an output interface between the device 800 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front facing camera and/or a rear facing camera. The front-facing camera and/or the rear-facing camera may receive external multimedia data when the device 800 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.
The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the apparatus 800 is in an operational mode, such as a call mode, a recording mode, and a voice information processing mode. The received audio signals may further be stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 also includes a speaker for outputting audio signals.
The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.
The sensor assembly 814 includes one or more sensors for providing various aspects of state assessment for the device 800. For example, the sensor assembly 814 may detect the open/closed state of the device 800, the relative positioning of the components, such as a display and keypad of the apparatus 800, the sensor assembly 814 may also detect a change in position of the apparatus 800 or a component of the apparatus 800, the presence or absence of user contact with the apparatus 800, orientation or acceleration/deceleration of the apparatus 800, and a change in temperature of the apparatus 800. Sensor assembly 814 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
The communication component 816 is configured to facilitate communications between the apparatus 800 and other devices in a wired or wireless manner. The device 800 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 816 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on radio frequency information processing (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.
In an exemplary embodiment, the apparatus 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.
In an exemplary embodiment, a non-transitory computer-readable storage medium comprising instructions, such as the memory 804 comprising instructions, executable by the processor 820 of the device 800 to perform the above-described method is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
Fig. 4 is a schematic diagram of a server in some embodiments of the present application. The server 1900 may vary widely by configuration or performance and may include one or more Central Processing Units (CPUs) 1922 (e.g., one or more processors) and memory 1932, one or more storage media 1930 (e.g., one or more mass storage devices) storing applications 1942 or data 1944. Memory 1932 and storage medium 1930 can be, among other things, transient or persistent storage. The program stored in the storage medium 1930 may include one or more modules (not shown), each of which may include a series of instructions operating on a server. Still further, a central processor 1922 may be provided in communication with the storage medium 1930 to execute a series of instruction operations in the storage medium 1930 on the server 1900.
The server 1900 may also include one or more power supplies 1926, one or more wired or wireless network interfaces 1950, one or more input-output interfaces 1958, one or more keyboards 1956, and/or one or more operating systems 1941, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.
A non-transitory computer-readable storage medium in which instructions, when executed by a processor of an apparatus (server or terminal), enable the apparatus to perform the face data processing method shown in fig. 1.
A non-transitory computer-readable storage medium in which instructions, when executed by a processor of an apparatus (server or terminal), enable the apparatus to perform a face data processing method, the method comprising: constructing a data array according to the ciphertext of the data to be boxed; determining the box dividing number and the box dividing mode of the data to be subjected to box dividing; determining quantiles according to the box dividing number and the box dividing mode; and determining a corresponding splitting point of the data to be subjected to box splitting in the box splitting mode based on the quantiles and the data array.
The application discloses A1, a data processing method, the method is used for determining the segmentation point of the data to be boxed based on the ciphertext of the data, and the method comprises the following steps:
constructing a data array according to the ciphertext of the data to be boxed;
determining the box dividing number and the box dividing mode of the data to be subjected to box dividing;
determining quantiles according to the box dividing number and the box dividing mode;
and determining a corresponding splitting point of the data to be subjected to box splitting in the box splitting mode based on the quantiles and the data array.
A2, the method of A1, the determining quantiles according to the bin counts and the bin patterns, comprising:
if the binning mode is equal-frequency binning, determining a first cut-rate proportion according to the binning number, and determining quantiles based on the first cut-rate proportion; or
And if the binning mode is user-defined frequency dividing binning, determining the quantile according to a user-defined second segmentation proportion.
A3, according to the method in A1, the determining the corresponding cut points of the data to be binned in the binning mode based on the quantiles and the data array comprises:
constructing a quantile index array according to the quantile;
and determining a corresponding splitting point of the data to be subjected to box splitting in the box splitting mode based on the quantile index array and the data array.
A4, the method according to A3, wherein the constructing a quantile index array according to the quantiles comprises:
determining the number n of the data to be subjected to box separation and the number m of the quantiles;
determining an index value corresponding to the ith quantile q (i) according to the following formula: q (i) x (n-1), wherein i is [1, m ], and i is an integer;
and constructing a quantile index array according to the index values respectively corresponding to the m quantiles.
A5, according to the method of A3, the determining a corresponding cut point of the data to be binned in the binning mode based on the quantile index array and the data array includes:
respectively constructing an upper-bound index array and a lower-bound index array according to index values in the quantile index array;
determining an upper bound weight array according to the upper bound index array and the quantile index array, and determining a lower bound weight array according to the upper bound weight array;
based on the ciphertext of the data to be boxed, sequencing the data array to obtain an ordered array;
obtaining a value from the ordered array according to the upper bound index array to obtain an upper bound data array, and obtaining a value from the ordered array according to the lower bound index array to obtain a lower bound data array;
and performing linear weighting calculation on the upper bound data array and the lower bound data array according to the upper bound weight array and the lower bound weight array to obtain a result array containing the determined dividing points.
A6, according to the method in A5, respectively constructing an upper bound index array and a lower bound index array according to index values in the quantile index array, including:
determining an upper bound value and a lower bound value corresponding to the index value in the quantile index array;
and constructing an upper-bound index array according to an upper-bound value corresponding to the index value in the quantile index array, and constructing a lower-bound index array according to a lower-bound value corresponding to the index value in the quantile index array.
A7, according to the method in A6, the determining the upper bound value and the lower bound value corresponding to the index value in the quantile index array includes:
judging whether a current index value in the quantile index array corresponds to a last data element in the data array, if so, determining an upper bound value and a lower bound value corresponding to the current index value as the current index value; otherwise, rounding the current index value downwards to obtain a lower bound value corresponding to the current index value, and adding 1 to the lower bound value corresponding to the current index value to obtain an upper bound value corresponding to the current index value.
A8, according to the method of A5, determining an upper bound weight array according to the upper bound index array and the quantile index array, and determining a lower bound weight array according to the upper bound weight array, including:
determining an upper bound weight corresponding to the index value according to the distance between the upper bound value in the upper bound index array and the index value in the quantile index array;
constructing an upper bound weight array according to the upper bound weight corresponding to each index value in the quantile index array;
and determining a lower bound weight array according to the difference between the array formed by the numerical values 1 and the upper bound weight array.
A9, according to the method of any A1 to A8, the dimension of the data array is greater than or equal to 1, and in the case that the dimension of the data array is greater than 1, the dividing point of the data to be classified is determined based on one of the dimensions.
The application discloses B10, a data processing device, the device is used for based on the cryptogram of data, confirms the segmentation point of waiting to box data, the device includes:
the data array construction module is used for constructing a data array according to the ciphertext of the data to be boxed;
the box dividing parameter determining module is used for determining the box dividing number and the box dividing mode of the data to be subjected to box dividing;
the quantile determining module is used for determining quantiles according to the box dividing number and the box dividing mode;
and the dividing point determining module is used for determining the dividing point corresponding to the data to be subjected to the box dividing in the box dividing mode based on the quantile and the data array.
B11, the apparatus of B10, the quantile determination module comprising:
the first determining module is used for determining a first cut proportion according to the box dividing number if the box dividing mode is equal-frequency box dividing, and determining quantiles based on the first cut proportion; or
And the second determining module is used for determining the quantile according to a self-defined second segmentation proportion if the binning mode is self-defined frequency dividing binning.
B12, the apparatus of B10, the cut point determining module comprising:
the index construction submodule is used for constructing a quantile index array according to the quantile;
and the dividing point determining submodule is used for determining the dividing point corresponding to the data to be subjected to the box dividing in the box dividing mode based on the quantile index array and the data array.
B13, the apparatus according to B12, the index building submodule comprising:
the number determining unit is used for determining the number n of the data to be subjected to binning and the number m of the quantiles;
an index determining unit, configured to determine an index value corresponding to an ith quantile q (i) according to the following formula: q (i) x (n-1), wherein i is [1, m ], and i is an integer;
and the first construction unit is used for constructing a quantile index array according to the index values respectively corresponding to the m quantiles.
B14, the apparatus of B12, the cut point determining submodule comprising:
the second construction unit is used for respectively constructing an upper-bound index array and a lower-bound index array according to the index values in the quantile index array;
the weight determining unit is used for determining an upper bound weight array according to the upper bound index array and the quantile index array, and determining a lower bound weight array according to the upper bound weight array;
the data sorting unit is used for sorting the data array based on the ciphertext of the data to be boxed to obtain an ordered array;
the data reading unit is used for obtaining a value from the ordered array according to the upper bound index array to obtain an upper bound data array and obtaining a value from the ordered array according to the lower bound index array to obtain a lower bound data array;
and the dividing point determining unit is used for performing linear weighting calculation on the upper bound data array and the lower bound data array according to the upper bound weight array and the lower bound weight array to obtain a result array containing the determined dividing points.
B15, the apparatus according to B14, the second building unit comprising:
an upper and lower bound determining subunit, configured to determine an upper bound value and a lower bound value corresponding to the index value in the quantile index array;
and the index constructing subunit is used for constructing an upper bound index array according to an upper bound value corresponding to the index value in the quantile index array, and constructing a lower bound index array according to a lower bound value corresponding to the index value in the quantile index array.
B16, according to the device of B15, the upper and lower boundary determining subunit is specifically configured to:
judging whether a current index value in the quantile index array corresponds to a last data element in the data array, if so, determining an upper bound value and a lower bound value corresponding to the current index value as the current index value; otherwise, rounding the current index value downwards to obtain a lower bound value corresponding to the current index value, and adding 1 to the lower bound value corresponding to the current index value to obtain an upper bound value corresponding to the current index value.
B17, the apparatus of B14, the weight determination unit, comprising:
the upper bound weight determining subunit is used for determining the upper bound weight corresponding to the index value according to the distance between the upper bound value in the upper bound index array and the index value in the quantile index array;
the upper bound weight array determining subunit is used for constructing an upper bound weight array according to the upper bound weight corresponding to each index value in the quantile index array;
and the lower bound weight array determining subunit is used for determining the lower bound weight array according to the difference value between the array formed by the numerical values 1 and the upper bound weight array.
B18, according to the device of any one of B10 to B17, the dimension of the data array is greater than or equal to 1, and in the case that the dimension of the data array is greater than 1, the dividing point of the data to be divided into boxes is determined based on one dimension.
The application discloses C19, an apparatus for data processing for determining a cut point of data to be binned based on ciphertext of the data, comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for:
constructing a data array according to the ciphertext of the data to be boxed;
determining the box dividing number and the box dividing mode of the data to be subjected to box dividing;
determining quantiles according to the box dividing number and the box dividing mode;
and determining a corresponding splitting point of the data to be subjected to box splitting in the box splitting mode based on the quantiles and the data array.
C20, the apparatus of C19, the determining quantiles from the bin counts and the bin patterns, comprising:
if the binning mode is equal-frequency binning, determining a first cut-rate proportion according to the binning number, and determining quantiles based on the first cut-rate proportion; or
And if the binning mode is user-defined frequency dividing binning, determining the quantile according to a user-defined second segmentation proportion.
C21, the determining the corresponding cut point of the data to be binned in the binning mode based on the quantile and the data array according to the device of C19, includes:
constructing a quantile index array according to the quantile;
and determining a corresponding splitting point of the data to be subjected to box splitting in the box splitting mode based on the quantile index array and the data array.
C22, the apparatus according to C21, the constructing a quantile index array according to the quantiles comprises:
determining the number n of the data to be subjected to box separation and the number m of the quantiles;
determining an index value corresponding to the ith quantile q (i) according to the following formula: q (i) x (n-1), wherein i is [1, m ], and i is an integer;
and constructing a quantile index array according to the index values respectively corresponding to the m quantiles.
C23, the determining the corresponding cut point of the data to be binned in the binning mode based on the quantile index array and the data array according to the device of C21 includes:
respectively constructing an upper-bound index array and a lower-bound index array according to index values in the quantile index array;
determining an upper bound weight array according to the upper bound index array and the quantile index array, and determining a lower bound weight array according to the upper bound weight array;
based on the ciphertext of the data to be boxed, sequencing the data array to obtain an ordered array;
obtaining a value from the ordered array according to the upper bound index array to obtain an upper bound data array, and obtaining a value from the ordered array according to the lower bound index array to obtain a lower bound data array;
and performing linear weighting calculation on the upper bound data array and the lower bound data array according to the upper bound weight array and the lower bound weight array to obtain a result array containing the determined dividing points.
C24, according to the apparatus of C23, the respectively constructing an upper bound index array and a lower bound index array according to the index values in the quantile index array comprises:
determining an upper bound value and a lower bound value corresponding to the index value in the quantile index array;
and constructing an upper-bound index array according to an upper-bound value corresponding to the index value in the quantile index array, and constructing a lower-bound index array according to a lower-bound value corresponding to the index value in the quantile index array.
C25, according to the apparatus of C24, the determining the upper bound value and the lower bound value corresponding to the index value in the quantile index array includes:
judging whether a current index value in the quantile index array corresponds to a last data element in the data array, if so, determining an upper bound value and a lower bound value corresponding to the current index value as the current index value; otherwise, rounding the current index value downwards to obtain a lower bound value corresponding to the current index value, and adding 1 to the lower bound value corresponding to the current index value to obtain an upper bound value corresponding to the current index value.
C26, the apparatus according to C23, the determining an upper bound weight array according to the upper bound index array and the quantile index array, and the determining a lower bound weight array according to the upper bound weight array, comprising:
determining an upper bound weight corresponding to the index value according to the distance between the upper bound value in the upper bound index array and the index value in the quantile index array;
constructing an upper bound weight array according to the upper bound weight corresponding to each index value in the quantile index array;
and determining a lower bound weight array according to the difference between the array formed by the numerical values 1 and the upper bound weight array.
C27, the device according to any one of C19 to C26, the dimension of the data array is greater than or equal to 1, and in the case that the dimension of the data array is greater than 1, the dividing point of the data to be divided is determined based on one dimension.
The present application discloses D28, a machine-readable medium having stored thereon instructions, which when executed by one or more processors, cause an apparatus to perform a data processing method as described in any of a1 to a 9.
Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice in the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.
It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.
The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.
The data processing method, the data processing apparatus and the apparatus for data processing provided by the present application are introduced in detail above, and specific examples are applied herein to illustrate the principles and embodiments of the present application, and the above descriptions of the embodiments are only used to help understand the method and the core ideas of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims (10)

1. A data processing method for determining slicing points of data to be binned based on ciphertext of the data, the method comprising:
constructing a data array according to the ciphertext of the data to be boxed;
determining the box dividing number and the box dividing mode of the data to be subjected to box dividing;
determining quantiles according to the box dividing number and the box dividing mode;
and determining a corresponding splitting point of the data to be subjected to box splitting in the box splitting mode based on the quantiles and the data array.
2. The method of claim 1, wherein determining a quantile from the binning count and the binning mode comprises:
if the binning mode is equal-frequency binning, determining a first cut-rate proportion according to the binning number, and determining quantiles based on the first cut-rate proportion; or
And if the binning mode is user-defined frequency dividing binning, determining the quantile according to a user-defined second segmentation proportion.
3. The method according to claim 1, wherein the determining a corresponding cut point of the data to be binned in the binning mode based on the quantile and the data array comprises:
constructing a quantile index array according to the quantile;
and determining a corresponding splitting point of the data to be subjected to box splitting in the box splitting mode based on the quantile index array and the data array.
4. The method of claim 3, wherein constructing a quantile index array from the quantiles comprises:
determining the number n of the data to be subjected to box separation and the number m of the quantiles;
determining an index value corresponding to the ith quantile q (i) according to the following formula: q (i) x (n-1), wherein i is [1, m ], and i is an integer;
and constructing a quantile index array according to the index values respectively corresponding to the m quantiles.
5. The method according to claim 3, wherein the determining a corresponding cut point of the data to be binned in the binning mode based on the quantile index array and the data array comprises:
respectively constructing an upper-bound index array and a lower-bound index array according to index values in the quantile index array;
determining an upper bound weight array according to the upper bound index array and the quantile index array, and determining a lower bound weight array according to the upper bound weight array;
based on the ciphertext of the data to be boxed, sequencing the data array to obtain an ordered array;
obtaining a value from the ordered array according to the upper bound index array to obtain an upper bound data array, and obtaining a value from the ordered array according to the lower bound index array to obtain a lower bound data array;
and performing linear weighting calculation on the upper bound data array and the lower bound data array according to the upper bound weight array and the lower bound weight array to obtain a result array containing the determined dividing points.
6. The method of claim 5, wherein the constructing an upper bound index array and a lower bound index array according to the index values in the quantile index array comprises:
determining an upper bound value and a lower bound value corresponding to the index value in the quantile index array;
and constructing an upper-bound index array according to an upper-bound value corresponding to the index value in the quantile index array, and constructing a lower-bound index array according to a lower-bound value corresponding to the index value in the quantile index array.
7. The method of claim 6, wherein determining the upper bound value and the lower bound value corresponding to the index value in the quantile index array comprises:
judging whether a current index value in the quantile index array corresponds to a last data element in the data array, if so, determining an upper bound value and a lower bound value corresponding to the current index value as the current index value; otherwise, rounding the current index value downwards to obtain a lower bound value corresponding to the current index value, and adding 1 to the lower bound value corresponding to the current index value to obtain an upper bound value corresponding to the current index value.
8. A data processing apparatus for determining slicing points of data to be binned based on ciphertext of the data, the apparatus comprising:
the data array construction module is used for constructing a data array according to the ciphertext of the data to be boxed;
the box dividing parameter determining module is used for determining the box dividing number and the box dividing mode of the data to be subjected to box dividing;
the quantile determining module is used for determining quantiles according to the box dividing number and the box dividing mode;
and the dividing point determining module is used for determining the dividing point corresponding to the data to be subjected to the box dividing in the box dividing mode based on the quantile and the data array.
9. An apparatus for data processing, the apparatus for determining a cut point for data to be binned based on ciphertext of the data, comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured for execution by the one or more processors to perform the one or more programs including instructions for:
constructing a data array according to the ciphertext of the data to be boxed;
determining the box dividing number and the box dividing mode of the data to be subjected to box dividing;
determining quantiles according to the box dividing number and the box dividing mode;
and determining a corresponding splitting point of the data to be subjected to box splitting in the box splitting mode based on the quantiles and the data array.
10. A machine-readable medium having stored thereon instructions, which when executed by one or more processors, cause an apparatus to perform the data processing method of any of claims 1 to 7.
CN202010261498.2A 2020-04-03 2020-04-03 Data processing method and device and data processing device Active CN112667608B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010261498.2A CN112667608B (en) 2020-04-03 2020-04-03 Data processing method and device and data processing device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010261498.2A CN112667608B (en) 2020-04-03 2020-04-03 Data processing method and device and data processing device

Publications (2)

Publication Number Publication Date
CN112667608A true CN112667608A (en) 2021-04-16
CN112667608B CN112667608B (en) 2022-01-25

Family

ID=75402777

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010261498.2A Active CN112667608B (en) 2020-04-03 2020-04-03 Data processing method and device and data processing device

Country Status (1)

Country Link
CN (1) CN112667608B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030009470A1 (en) * 2001-04-25 2003-01-09 Leary James F. Subtractive clustering for use in analysis of data
US20040195500A1 (en) * 2003-04-02 2004-10-07 Sachs Jeffrey R. Mass spectrometry data analysis techniques
US20130030761A1 (en) * 2011-07-29 2013-01-31 Choudur Lakshminarayan Statistically-based anomaly detection in utility clouds
CN107924339A (en) * 2015-08-12 2018-04-17 微软技术许可有限责任公司 Data center's privacy
CN110032878A (en) * 2019-03-04 2019-07-19 阿里巴巴集团控股有限公司 A kind of safe Feature Engineering method and apparatus
CN110162551A (en) * 2019-04-19 2019-08-23 阿里巴巴集团控股有限公司 Data processing method, device and electronic equipment
CN110245140A (en) * 2019-06-12 2019-09-17 同盾控股有限公司 Data branch mailbox processing method and processing device, electronic equipment and computer-readable medium
CN110659325A (en) * 2018-05-31 2020-01-07 罗伯特·博世有限公司 System and method for large scale multidimensional spatiotemporal data analysis

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030009470A1 (en) * 2001-04-25 2003-01-09 Leary James F. Subtractive clustering for use in analysis of data
US20040195500A1 (en) * 2003-04-02 2004-10-07 Sachs Jeffrey R. Mass spectrometry data analysis techniques
US20130030761A1 (en) * 2011-07-29 2013-01-31 Choudur Lakshminarayan Statistically-based anomaly detection in utility clouds
CN107924339A (en) * 2015-08-12 2018-04-17 微软技术许可有限责任公司 Data center's privacy
CN110659325A (en) * 2018-05-31 2020-01-07 罗伯特·博世有限公司 System and method for large scale multidimensional spatiotemporal data analysis
CN110032878A (en) * 2019-03-04 2019-07-19 阿里巴巴集团控股有限公司 A kind of safe Feature Engineering method and apparatus
CN110162551A (en) * 2019-04-19 2019-08-23 阿里巴巴集团控股有限公司 Data processing method, device and electronic equipment
CN110245140A (en) * 2019-06-12 2019-09-17 同盾控股有限公司 Data branch mailbox processing method and processing device, electronic equipment and computer-readable medium

Also Published As

Publication number Publication date
CN112667608B (en) 2022-01-25

Similar Documents

Publication Publication Date Title
KR102632647B1 (en) Methods and devices, electronic devices, and memory media for detecting face and hand relationships
CN111539443A (en) Image recognition model training method and device and storage medium
KR101994561B1 (en) Website hijack detection method and device
CN106599191B (en) User attribute analysis method and device
CN114840568B (en) Ciphertext sorting method and device and ciphertext sorting device
CN112667741B (en) Data processing method and device and data processing device
CN112487415B (en) Method and device for detecting security of computing task
CN117349671A (en) Model training method and device, storage medium and electronic equipment
CN111125388B (en) Method, device and equipment for detecting multimedia resources and storage medium
CN112667608B (en) Data processing method and device and data processing device
CN114298227A (en) Text duplicate removal method, device, equipment and medium
CN110019657B (en) Processing method, apparatus and machine-readable medium
CN106919395B (en) Application notification display method and device
CN112651221A (en) Data processing method and device and data processing device
CN112307353B (en) Data processing method and device, electronic equipment and storage medium
CN113689520B (en) Graph data processing method and device, electronic equipment and storage medium
CN113254709B (en) Content data processing method and device and storage medium
CN113157703B (en) Data query method and device, electronic equipment and storage medium
CN113469215B (en) Data processing method and device, electronic equipment and storage medium
CN110457560B (en) Method for obtaining click rate and related device
CN112016637B (en) Hierarchical sampling method and device for hierarchical sampling
CN113886657A (en) Data processing method and device, electronic equipment and storage medium
CN114022248A (en) Product recommendation calculation method, device and equipment
CN107544969B (en) Method for optimizing size of static lexicon and electronic equipment
CN113869312A (en) Formula identification method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant