CN109726363B - Data statistical method and device - Google Patents

Data statistical method and device Download PDF

Info

Publication number
CN109726363B
CN109726363B CN201711046886.3A CN201711046886A CN109726363B CN 109726363 B CN109726363 B CN 109726363B CN 201711046886 A CN201711046886 A CN 201711046886A CN 109726363 B CN109726363 B CN 109726363B
Authority
CN
China
Prior art keywords
data
parameter
partner
statistical
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201711046886.3A
Other languages
Chinese (zh)
Other versions
CN109726363A (en
Inventor
王华忠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Advanced New Technologies Co Ltd
Advantageous New Technologies Co Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201711046886.3A priority Critical patent/CN109726363B/en
Priority to TW107130573A priority patent/TWI689828B/en
Priority to PCT/CN2018/105482 priority patent/WO2019085656A1/en
Publication of CN109726363A publication Critical patent/CN109726363A/en
Application granted granted Critical
Publication of CN109726363B publication Critical patent/CN109726363B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/08Insurance

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Mathematical Physics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Analysis (AREA)
  • Computational Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Business, Economics & Management (AREA)
  • Technology Law (AREA)
  • Evolutionary Biology (AREA)
  • Strategic Management (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Economics (AREA)
  • Algebra (AREA)
  • Development Economics (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Telephonic Communication Services (AREA)
  • Complex Calculations (AREA)

Abstract

The embodiment of the specification provides a data statistical method and a device, wherein the method comprises the following steps: generating a first parameter and a second parameter corresponding to each data identity; if the first data corresponding to the data identification do not participate in data statistics, the second parameter is equal to the first parameter, otherwise, the second parameter is obtained by calculation according to the first parameter and the first data; sending each data identifier and the corresponding first parameter and second parameter to a cooperative data party; receiving a partner calculation value returned by a partner data party, wherein the partner calculation value is obtained by the partner data party according to the selected first parameter or the selected second parameter; and removing the calculated value of each first parameter from the calculated value of the partner to obtain the required statistical value.

Description

Data statistical method and device
Technical Field
The present disclosure relates to the field of network technologies, and in particular, to a data statistics method and apparatus.
Background
In the big data era, there are very many data islands. For example, data of a natural person can be stored in different enterprises in a scattered manner, and the enterprises are not completely trusted with each other due to the consideration of competition and privacy protection of users, which causes a barrier to statistical work related to data cooperation between the enterprises. On the premise of fully protecting the core data privacy of the enterprise, the method can not only utilize the data owned by both parties to complete some data statistics and calculation, but also cannot reveal the respective data privacy security of the enterprise, and becomes an urgent problem to be solved urgently. But there is currently no good solution.
Disclosure of Invention
In view of this, the present disclosure provides a data statistics method and apparatus, so as to implement secure computation of two parties on the basis of protecting data privacy of two data owners.
Specifically, one or more embodiments of the present disclosure are implemented by the following technical solutions:
in a first aspect, a data statistics method is provided, where the method is applied to perform data statistics on data of a local data side and a partner data side in a combined manner, where the local data side has a plurality of first data to be subjected to statistical value calculation, the plurality of first data respectively correspond to different data identifiers, and the partner data side has a plurality of second data corresponding to the data identifiers, and the method includes:
generating a first parameter and a second parameter corresponding to each data identifier; if the first data corresponding to the data identification do not participate in data statistics, a second parameter is equal to the first parameter, otherwise, the second parameter is obtained by calculation according to the first parameter and the first data;
sending each data identifier, and a first parameter and a second parameter corresponding to the data identifier to a cooperative data party;
receiving a partner calculation value returned by a partner data party, wherein the partner calculation value is obtained by the partner data party according to the selected first parameter or second parameter, if the second data corresponding to the data identification participates in data statistics, the partner data party selects the second parameter, otherwise, the partner data party selects the first parameter;
and removing the calculated value of each first parameter from the calculated value of the partner to obtain the statistical value.
In a second aspect, a data statistics method is provided, where the method is used for performing data statistics between a local data side and a statistical data side, where the statistical data side has multiple first data of a statistical value to be calculated, the multiple first data respectively correspond to different data identifiers, and the local data side has second data corresponding to the same data identifier; the method comprises the following steps:
receiving a data identifier sent by the statistical data party, and a first parameter and a second parameter corresponding to the data identifier; when first data corresponding to the data identification participate in data statistics, the second parameter is obtained by calculation according to the first parameter and the first data, otherwise, the second parameter is equal to the first parameter;
if the second data corresponding to the data identification is data participating in data statistics locally, selecting a second parameter corresponding to the data identification; otherwise, selecting a first parameter corresponding to the data identifier;
performing statistical calculation according to the selected first parameter and the selected second parameter to obtain a partner calculation value;
and sending the partner calculation value to the statistical data party so that the statistical data party removes the calculation value of each first parameter according to the partner calculation value to obtain the statistical value.
In a third aspect, a data statistics apparatus is provided, where the apparatus is used to perform data statistics on data of a local data side and a partner data side in a combined manner, where the local data side has a plurality of first data of a statistical value to be calculated, the plurality of first data respectively correspond to different data identifiers, and the partner data side has a plurality of second data corresponding to the data identifiers; the device comprises:
a parameter generation module for generating a first parameter and a second parameter corresponding to each data identifier; if the first data corresponding to the data identification do not participate in data statistics, a second parameter is equal to the first parameter, otherwise, the second parameter is obtained by calculation according to the first parameter and the first data;
the data sending module is used for sending each data identifier and the first parameter and the second parameter corresponding to the data identifier to the cooperative data party;
the data receiving module is used for receiving a partner calculation value returned by a partner data party, wherein the partner calculation value is obtained by the partner data party according to the selected first parameter or the selected second parameter, if the second data corresponding to the data identification participates in data statistics, the partner data party selects the second parameter, and if not, the partner data party selects the first parameter;
and the statistical processing module is used for removing the calculated value of each first parameter from the calculated value of the partner to obtain the statistical value.
In a fourth aspect, a data statistics apparatus is provided, where the apparatus is configured to perform data statistics between a local data side and a statistical data side, where the statistical data side has a plurality of first data of a statistical value to be calculated, the plurality of first data respectively correspond to different data identifiers, and the local data side has second data corresponding to the same data identifier; the device comprises:
the parameter receiving module is used for receiving the data identifier sent by the statistical data party and the first parameter and the second parameter corresponding to the data identifier; when first data corresponding to the data identification participate in data statistics, the second parameter is obtained by calculation according to the first parameter and the first data, otherwise, the second parameter is equal to the first parameter;
the parameter selection module is used for selecting a second parameter corresponding to the data identifier if the second data corresponding to the data identifier is data participating in data statistics locally; otherwise, selecting a first parameter corresponding to the data identifier;
the statistical calculation module is used for carrying out statistical calculation according to the selected first parameter and the selected second parameter to obtain a partner calculation value;
and the numerical value sending module is used for sending the partner calculation value to the statistical data party so that the statistical data party removes the calculation value of each first parameter according to the partner calculation value to obtain the statistical value.
In a fifth aspect, there is provided a data statistics apparatus, the apparatus comprising a memory, a processor, and computer instructions stored on the memory and executable on the processor, the processor implementing the following steps when executing the instructions:
generating a first parameter and a second parameter corresponding to each data identifier; if the first data corresponding to the data identification do not participate in data statistics, a second parameter is equal to the first parameter, otherwise, the second parameter is obtained by calculation according to the first parameter and the first data;
sending each data identifier, and a first parameter and a second parameter corresponding to the data identifier to a cooperative data party;
receiving a partner calculation value returned by a partner data party, wherein the partner calculation value is obtained by the partner data party according to the selected first parameter or second parameter, if the second data corresponding to the data identification participates in data statistics, the partner data party selects the second parameter, otherwise, the partner data party selects the first parameter;
and removing the calculated value of each first parameter from the calculated value of the partner to obtain the statistical value.
In a sixth aspect, a data statistics apparatus is provided, the apparatus comprising a memory, a processor, and computer instructions stored on the memory and executable on the processor, the processor when executing the instructions implementing the steps of:
receiving a data identifier sent by the statistical data party, and a first parameter and a second parameter corresponding to the data identifier; when first data corresponding to the data identification participate in data statistics, the second parameter is obtained by calculation according to the first parameter and the first data, otherwise, the second parameter is equal to the first parameter;
if the second data corresponding to the data identification is data participating in data statistics locally, selecting a second parameter corresponding to the data identification; otherwise, selecting a first parameter corresponding to the data identifier;
performing statistical calculation according to the selected first parameter and the selected second parameter to obtain a partner calculation value;
and sending the partner calculation value to the statistical data party so that the statistical data party removes the calculation value of each first parameter according to the partner calculation value to obtain the statistical value.
According to the data statistics method and device in one or more embodiments of the specification, the first parameter and the second parameter used for confusing real data are generated, when the parameters are sent to the cooperative data party, the cooperative data party cannot know the real data of the local terminal, the cooperative calculation value returned by the cooperative data party is determined according to the data filtering condition of the cooperative data party, and the local terminal cannot know the data selection made by the cooperative data party, so that the two-party safety calculation is performed by combining the data of the two parties on the basis of protecting the data privacy of the two data owners.
Drawings
In order to more clearly illustrate one or more embodiments or technical solutions in the prior art in the present specification, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in one or more embodiments of the present specification, and other drawings can be obtained by those skilled in the art without inventive exercise.
FIG. 1 is a flow diagram of a data statistics method provided in one or more embodiments of the present disclosure;
FIG. 2 is a flow diagram of a data summation statistic provided in one or more embodiments of the present description;
FIG. 3 is a schematic structural diagram of a data statistics apparatus according to one or more embodiments of the present disclosure;
fig. 4 is a schematic structural diagram of a data statistics apparatus according to one or more embodiments of the present disclosure.
Detailed Description
In order to make those skilled in the art better understand the technical solutions in one or more embodiments of the present disclosure, the technical solutions in one or more embodiments of the present disclosure will be clearly and completely described below with reference to the drawings in one or more embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments, and not all embodiments. All other embodiments that can be derived by one of ordinary skill in the art from one or more embodiments of the disclosure without making any creative effort shall fall within the scope of protection of the disclosure.
In the big data era, the data storage mode can be a vertical mode, namely a plurality of data owners can own different attribute information of the same entity, for example, the automobile insurance of the same natural person is distributed at one organization, and the claim amount of the natural person is distributed at another organization. The data storage in the vertical mode may cause that when some data statistics calculation is performed, a plurality of data owners are involved, and the data statistics needs to be completed by cooperation of the plurality of data owners. However, due to the competitive relationship between different enterprises or privacy protection considerations, the respective data secrets of the enterprises cannot be revealed.
In the example of the present disclosure, data statistics is performed based on data of different data owners without revealing respective data privacy of the data owners, and the method will be described in detail below by taking an example application scenario as an example.
Application scenarios:
in one example, there may be two data sources, respectively: data source a and data source B. Assuming that the data source A can be a data mechanism and the data source B can be an insurance mechanism, the two data sources can respectively store different information of the same owner.
A data source A: assuming that the data source A can store the vehicle insurance score of each vehicle owner, the vehicle insurance score can be a score obtained after accurate portrait and risk analysis is carried out on the vehicle owner, and the higher the score of the vehicle insurance score is, the lower the risk can be indicated. As shown in table 1, an example of the data structure of the data source a side storage car insurance score is as follows:
TABLE 1 data Structure of data Source A
Column name Type (B) Description of the invention Examples of the invention
idcard_no string Identity card number ******197309119564
score int Vehicle insurance score 510
And a data source B: it is assumed that the data source B can store claim information for each vehicle owner, for example, the claim information for the vehicle owner can include the number of claims, the amount of claims, and the like. As shown in table 2, an example of the data structure of each vehicle owner stored on the data source B side is as follows:
TABLE 2 data Structure of data Source B
Column name Type (B) Description of the invention Examples of the invention
idcard_no string Identity card number ******197309119564
gender string Sex female
times int Number of claims in the last year 3
amount int Amount of claims 3500
Based on the application scenario, the data statistics can be processed based on the data of the data source a and the data source B. For example, the requirement of the statistical work may be "the sum of the number of claims of female users with the vehicle insurance score larger than 500 points", then the "vehicle insurance score larger than 500 points" needs to be determined according to the data of the data source A, and the data of the "female users, the number of claims" are stored in the data source B, so the statistical work needs the data cooperation of the data source A and the data source B.
In the description of the data statistics method in one or more embodiments of the present specification, a data source having statistics data may be referred to as a statistics data party, and another data source may be referred to as a partner data party. For example, in the statistical work "total of the number of claims of female users having a statistical car insurance score of more than 500 points", the "number of claims" is statistical data, so that the data source B is a statistical data party, and the data source a is a partner data party.
The statistics data part and the cooperation data part may respectively store different information of the same vehicle owner, and vehicle owner information (for example, the number of claims) to participate in statistics, which is stored in the statistics data part, may be referred to as first data, and vehicle owner information (for example, vehicle insurance score) which participates in statistics, which is stored in the cooperation data part, may be referred to as second data. In addition, the identification number idcard _ no included in both the data source a and the data source B may be referred to as a data identifier, a statistical data party (e.g., the data source B) may store first data corresponding to the data identifier, and a cooperative data party (e.g., the data source a) may store second data corresponding to the same data identifier.
Fig. 1 illustrates a flow of a data statistics method, which may include:
in step 100, the statistical data party generates a first parameter and a second parameter corresponding to each data identifier.
For example, the first parameter may be a random number, or the first parameter may be a value calculated from a random number, such as one-half of the random number.
For example, the value of the second parameter may be determined according to the data filtering condition, and if the first data corresponding to the data identifier satisfies the local data filtering condition and is data participating in data statistics, the second parameter may be calculated according to the first parameter and the first data. For example, the first parameter and the first data may be summed to obtain the second parameter. If the second data corresponding to the data identification does not satisfy the local data filtering condition, the second parameter may be set equal to the first parameter. However, in actual implementation, the generation method of the second parameter is not limited to the method of summing the first data and the first parameter, and other calculation methods may be adopted.
In step 102, the data statistics party sends the local data identifier, and the first parameter and the second parameter corresponding to the data identifier to the cooperative data party.
In step 104, the partner data party selects a parameter, and if the second data corresponding to the data identifier is data participating in data statistics locally, the second parameter corresponding to the data identifier is selected; otherwise, selecting the first parameter corresponding to the data identifier.
For example, after receiving the data identifier sent by the statistics data party and the first parameter and the second parameter corresponding to the data identifier, the partner data party may select the parameter at this step, and the selected parameter may participate in the processing at the subsequent step 106.
The cooperative data party can select a second parameter according to a local data filtering condition if second data corresponding to one data identifier meets the filtering condition and is data participating in data statistics; otherwise, if one data identifier corresponds to a second data unfiltered condition, not data participating in data statistics, the first parameter may be selected.
In step 106, the partner data side performs statistical calculation on the selected first parameter and the selected second parameter to obtain a partner calculation value. For example, when the statistical value to be obtained is a sum statistical value, the selected first parameter and the second parameter may be added; of course, in other statistical methods, the first parameter and the second parameter may be calculated in other forms corresponding to each other.
In step 108, the partner data side sends the partner calculation value to the statistics data side.
In step 110, the statistics data party uses the partner calculation value to remove the calculation value of the first parameter, so as to obtain the statistics value. For example, the sum of the respective first parameters may be subtracted from the partner calculation value.
The above-mentioned flow example of fig. 1 employs an Oblivious transfer protocol (OT), which is a two-party communication protocol capable of protecting privacy, and enables two communication parties to transmit messages in a selective obfuscation manner, so that a receiving party of a service can obtain some messages input by a sending party of the service in an Oblivious manner, and thus, the privacy of the receiving party can be protected from being known by the sending party.
For example, in the example of fig. 1, the statistics data side may send all the data identifiers and the corresponding first parameters and second parameters to the cooperative data side, where the statistics data side has set different values for the second parameters according to the local data filtering conditions, but from the perspective of the cooperative data side, all the data identifiers are received, and the filtered data of the statistics data side is not disclosed. Moreover, the statistics data party confuses the real data of the statistics data party in a mode of two parameters, and the first parameter and the second parameter transmitted to the cooperative data party are not real first data, so that the privacy of the data cannot be leaked. Furthermore, from the perspective of the data statistics party, the partner calculation value received by the data statistics party is selected by the data collaboration party after data filtering, but the data statistics party cannot distinguish which data the data collaboration party selects, and therefore, the data collaboration party can also obtain privacy protection.
Based on the data structure shown in table 1, it is assumed that data source a has car insurance score data as shown in table 3 below, where idcard _ no may be the identification number of the car owner and score may be the car insurance score of the car owner.
TABLE 3 data of data Source A
idcard_no score
1234567 490
2345678 501
3456789 530
Based on the data structure shown in table 2, assume that data source B has the following data structure in table 4:
TABLE 4 data of data Source B
idcard_no gender times amount
1234567 For male 3 5000
2345678 Woman 7 23000
3456789 Woman 6 16000
The total of the number of claims of female users having car insurance points greater than 500 points is counted as follows based on the above tables 3 and 4. It can be seen that the statistical data "number of claims" of this statistical work is stored in the data source B, and the column of times in table 4 may be referred to as "statistics column", i.e. the data in this column is to be summed up and counted. Whereas the "car insurance score greater than 500" of the filter terms is located at data source a (the second data is for the filter terms obtained as statistics) and the filter term "female" is located at data source B, i.e. the filter terms may exist at both data sources. The data source A and the data source B cooperate with each other to realize the work of statistical summation (obtaining statistical value) of claim settlement times.
FIG. 2 illustrates a process of performing summation statistics in conjunction with data source A and data source B, which may include:
in step 200, the data source B generates a random number for each row of data, and generates M0 and M1 according to the data filtering condition.
In this step, for example, in the data illustrated in table 4, the column corresponding to the number of claims time is a statistical column. 3, 7, and 6 are the first data in the statistical column.
For one random number generated for each line data, it is assumed that the random number corresponding to 1234567 is t1, the random number corresponding to 2345678 is t2, and the random number corresponding to 3456789 is t 3.
According to a local data filtering condition 'female user', a vehicle owner who can obtain 2345678 and 3456789 two idcard _ no accords with the condition, and the data is the first data participating in the data statistics; and the car owner of 1234567 does not meet the filtering condition and does not participate in data statistics. Accordingly, assuming that each first data in the statistical column is represented by b, a first parameter and a second parameter corresponding to each idcard _ no can be generated. The first parameter may be the random number corresponding to each idcard _ no, and the second parameter may be the sum of the random number and the first data corresponding to the idcard _ no, where the first data may be b participating in statistics.
As in the example of table 5 below, a random number is generated for each row of data, assuming the true value of the corresponding statistical column is b. Traversing each row of data, and if the row of data meets the filtering condition of the row of data, generating M0 (t), and M1 (t + b); if the filtering condition of the filter is not satisfied, M0, M1 and t are generated.
TABLE 5 MO and M1 for each row of data
idcard_no M0 M1
1234567 t1 t1
2345678 t2 t2+7
3456789 t3 t3+6
M0 and M1 generated in this step are by random valuesGenerating to confuse the real statistical column data, even if the partner data side receives M0 and M1 corresponding to idcard _ no, it cannot know what the real statistical column data b corresponding to idcard _ no is. For example, even if t corresponding to the data identity 2345678 is received2And t2+7, the true value of b, 7, is not known.
Further, the random numbers t respectively corresponding to the data identifications are described above1、t2And t3And may be different.
In step 202, the data source B sends the data identifier of each line of data and the MO and M1 corresponding to the data identifier to the data source a.
In step 204, the data source a selects M1 if the second data corresponding to the data identifier participates in data statistics according to the local data filtering condition, otherwise, selects MO.
For example, the data source a may determine whether the second data (score in table 3) corresponding to each data identifier idcard _ no is greater than 500 points according to the filtering condition "car insurance score is greater than 500 points". If the score corresponding to idcard _ no is greater than 500, "t + b" in table 5 is selected, otherwise, if the score corresponding to idcard _ no is less than 500, "t" in table 5 is selected.
For example, taking idcard _ no of 1234567 as an example, the data identifies that the corresponding car insurance score is 490, and does not satisfy the filtering condition of "car insurance score greater than 500 points", then M0 corresponding to 1234567 in table 5 may be selected, i.e., t 1. For another example, taking idcard _ no of 2345678 as an example, in table 3, the data identifies that the corresponding car insurance score is 501, and if the filtering condition that "the car insurance score is greater than 500 points" is satisfied, M1 corresponding to 2345678 in table 5 may be selected, that is, t2+7 is selected. Similarly, for idcard _ no of 3456789, t3+6 would be selected.
In step 206, the data source A accumulates the selection number to obtain an accumulated value.
For example, the data source A may perform an accumulation operation on the selected parameters to obtain an accumulated value. For example, the accumulated value may be M ═ t1+ t2+7+ t3+ 6. The accumulated value is the partner calculation value.
In step 208, data source A sends the accumulated value to data source B.
In step 210, data source B subtracts the sum of M0 from the accumulated value to obtain a statistical value.
In this step, after the data source B receives the accumulated value, the sum of all the random numbers MO is subtracted from the accumulated value, so as to obtain the sum of the number of claims to be counted. For example, M- (t1+ t2+ t3), where M is the accumulated value, may be calculated to be 13, which is the final statistical value.
In this example, after the data source B receives the accumulated value, it cannot know whether the data source a side specifically selects M0 or M1, but only receives one accumulated value; likewise, data source a cannot know the participating statistics filtered by data source B side, but only receives two parameters. Therefore, the method does not reveal detailed data of any party in the calculation process, and effectively completes the summation statistics of the two parties.
The flow shown in fig. 2 is exemplified by the case where the statistical value is the sum of a plurality of first data, for example, the sum of claim settlement times. In other examples, the data statistics method of one or more embodiments of the present disclosure may also be applied to other scenarios of statistical calculation.
In order to implement the method, one or more embodiments of the present specification further provide a data statistics apparatus, as shown in fig. 3, the apparatus may include: a parameter generating module 31, a data transmitting module 32, a data receiving module 33 and a statistical processing module 34.
A parameter generating module 31, configured to generate a first parameter and a second parameter corresponding to each data identifier; if the first data corresponding to the data identification do not participate in data statistics, a second parameter is equal to the first parameter, otherwise, the second parameter is obtained by calculation according to the first parameter and the first data;
a data sending module 32, configured to send each data identifier and the first parameter and the second parameter corresponding to the data identifier to the partner data party;
the data receiving module 33 is configured to receive a partner calculation value returned by a partner data party, where the partner calculation value is obtained by the partner data party according to the selected first parameter or the selected second parameter, and if the second data corresponding to the data identifier participates in data statistics, the partner data party selects the second parameter, otherwise, the partner data party selects the first parameter;
and the statistical processing module 34 is configured to remove the calculated value of each first parameter from the partner calculated value to obtain the statistical value.
In one example, the plurality of first data are located in the same statistical column of the local data source.
In an example, the parameter generating module 31 is configured to calculate a second parameter according to a first parameter and the first data, and specifically, sum and count the first parameter and the first data to obtain the second parameter. The statistical processing module 34, when configured to remove the calculated value of each first parameter from the calculated values of the collaborators, is specifically configured to subtract the sum of each first parameter from an accumulated value, where the accumulated value is obtained by the collaborators through accumulation according to the selected first parameter or the selected second parameter.
In an example, the parameter generating module 31, when configured to sum and count the first parameter and the first data to obtain the second parameter, is specifically configured to: if the first data corresponding to the data identifier meets a data filtering condition for determining participation statistical data, when the statistical value is the sum of a plurality of first data, the first parameter is a random number, and the second parameter is the sum of the random number and the first data.
In order to implement the method, one or more embodiments of the present specification further provide a data statistics apparatus, as shown in fig. 4, the apparatus may include: a parameter receiving module 41, a parameter selecting module 42, a statistic calculating module 43 and a value transmitting module 44.
A parameter receiving module 41, configured to receive a data identifier sent by the statistical data party, and a first parameter and a second parameter corresponding to the data identifier; when first data corresponding to the data identification participate in data statistics, the second parameter is obtained by calculation according to the first parameter and the first data, otherwise, the second parameter is equal to the first parameter;
the parameter selection module 42 is configured to select a second parameter corresponding to the data identifier if the second data corresponding to the data identifier is data that locally participates in data statistics; otherwise, selecting a first parameter corresponding to the data identifier;
a statistical calculation module 43, configured to perform statistical calculation according to the selected first parameter and the selected second parameter to obtain a partner calculation value;
a value sending module 44, configured to send the partner calculation value to the statistical data party, so that the statistical data party removes the calculation value of each first parameter according to the partner calculation value to obtain the statistical value.
For convenience of description, the above devices are described as being divided into various modules by functions, and are described separately. Of course, the functionality of the modules may be implemented in the same one or more software and/or hardware implementations in implementing one or more embodiments of the present description.
The execution sequence of each step in the flow shown in the above method embodiment is not limited to the sequence in the flow chart. Furthermore, the description of each step may be implemented in software, hardware or a combination thereof, for example, a person skilled in the art may implement it in the form of software code, and may be a computer executable instruction capable of implementing the corresponding logical function of the step. When implemented in software, the executable instructions may be stored in a memory and executed by a processor in the device.
For example, corresponding to the above method, one or more embodiments of the present specification also provide a data statistics apparatus for performing data statistics by combining data of a local data party and a partner data party, where the local data party has a plurality of first data of a statistical value to be calculated, the plurality of first data respectively correspond to different data identifiers, and the partner data party has a plurality of second data corresponding to the data identifiers. The apparatus may include a processor, a memory, and computer instructions stored on the memory and executable on the processor, the processor being operable to perform the following steps by executing the instructions:
generating a first parameter and a second parameter corresponding to each data identifier; if the first data corresponding to the data identification do not participate in data statistics, a second parameter is equal to the first parameter, otherwise, the second parameter is obtained by calculation according to the first parameter and the first data;
sending each data identifier, and a first parameter and a second parameter corresponding to the data identifier to a cooperative data party;
receiving a partner calculation value returned by a partner data party, wherein the partner calculation value is obtained by the partner data party according to the selected first parameter or second parameter, if the second data corresponding to the data identification participates in data statistics, the partner data party selects the second parameter, otherwise, the partner data party selects the first parameter;
and removing the calculated value of each first parameter from the calculated value of the partner to obtain the statistical value.
For example, corresponding to the above method, one or more embodiments of the present specification further provide a data statistics apparatus, where the apparatus is configured to perform data statistics between a local data side and a statistical data side, where the statistical data side has a plurality of first data of a statistical value to be calculated, the plurality of first data respectively correspond to different data identifiers, and the local data side has second data corresponding to the same data identifier. The apparatus may include a processor, a memory, and computer instructions stored on the memory and executable on the processor, the processor being operable to perform the following steps by executing the instructions:
receiving a data identifier sent by the statistical data party, and a first parameter and a second parameter corresponding to the data identifier; when first data corresponding to the data identification participate in data statistics, the second parameter is obtained by calculation according to the first parameter and the first data, otherwise, the second parameter is equal to the first parameter;
if the second data corresponding to the data identification is data participating in data statistics locally, selecting a second parameter corresponding to the data identification; otherwise, selecting a first parameter corresponding to the data identifier;
performing statistical calculation according to the selected first parameter and the selected second parameter to obtain a partner calculation value;
and sending the partner calculation value to the statistical data party so that the statistical data party removes the calculation value of each first parameter according to the partner calculation value to obtain the statistical value.
The apparatuses or modules illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. A typical implementation device is a computer, which may take the form of a personal computer, laptop computer, cellular telephone, camera phone, smart phone, personal digital assistant, media player, navigation device, email messaging device, game console, tablet computer, wearable device, or a combination of any of these devices.
One skilled in the art will recognize that one or more embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, one or more embodiments of the present description may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, one or more embodiments of the present description may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
One or more embodiments of the present description may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. One or more embodiments of the specification may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. Especially, for the server device embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and for relevant points, refer to part of the description of the method embodiment.
The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
The above description is only exemplary of the preferred embodiment of one or more embodiments of the present disclosure, and is not intended to limit the present disclosure, so that any modification, equivalent replacement, or improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims (10)

1. A data statistics method is applied to data statistics of a local data side and a cooperative data side in a combined mode, the local data side is provided with a plurality of first data of statistical values to be calculated, the first data correspond to different data identifications respectively, the cooperative data side is provided with a plurality of second data corresponding to the data identifications, and the method comprises the following steps:
generating a first parameter and a second parameter corresponding to each data identifier; if the first data corresponding to the data identification do not participate in data statistics, a second parameter is equal to the first parameter, otherwise, the second parameter is obtained by calculation according to the first parameter and the first data;
sending each data identifier, and a first parameter and a second parameter corresponding to the data identifier to a cooperative data party;
receiving a partner calculation value returned by a partner data party, wherein the partner calculation value is obtained by the partner data party according to the selected first parameter or second parameter, if the second data corresponding to the data identification participates in data statistics, the partner data party selects the second parameter, otherwise, the partner data party selects the first parameter;
and removing the calculated value of each first parameter from the calculated value of the partner to obtain the statistical value.
2. The method of claim 1, wherein the first and second light sources are selected from the group consisting of,
the second parameter is calculated according to the first parameter and the first data, and comprises:
the second parameter is obtained by summing and counting the first parameter and the first data;
the removing, from the partner calculation values, the calculation values of the respective first parameters includes:
the partner calculation value is an accumulated value obtained by the partner data party through accumulation according to the selected first parameter or the selected second parameter, and the sum of the first parameters is subtracted from the accumulated value.
3. The method of claim 2, wherein the first and second light sources are selected from the group consisting of,
the second parameter is obtained by summing and counting the first parameter and the first data, and comprises the following steps:
if the first data corresponding to the data identifier meets a data filtering condition for determining participation statistical data, when the statistical value is the sum of a plurality of first data, the first parameter is a random number, and the second parameter is the sum of the random number and the first data.
4. A data statistics method is used for performing data statistics between a local data side and a statistical data side, wherein the statistical data side is provided with a plurality of first data of statistical values to be calculated, the plurality of first data correspond to different data identifications respectively, and the local data side is provided with second data corresponding to the same data identification; the method comprises the following steps:
receiving a data identifier sent by the statistical data party, and a first parameter and a second parameter corresponding to the data identifier; when first data corresponding to the data identification participate in data statistics, the second parameter is obtained by calculation according to the first parameter and the first data, otherwise, the second parameter is equal to the first parameter;
if the second data corresponding to the data identification is data participating in data statistics locally, selecting a second parameter corresponding to the data identification; otherwise, selecting a first parameter corresponding to the data identifier;
performing statistical calculation according to the selected first parameter and the selected second parameter to obtain a partner calculation value;
and sending the partner calculation value to the statistical data party so that the statistical data party removes the calculation value of each first parameter according to the partner calculation value to obtain the statistical value.
5. A data statistics device is used for carrying out data statistics by combining data of a local data side and a cooperative data side, wherein the local data side is provided with a plurality of first data of statistical values to be calculated, the plurality of first data respectively correspond to different data identifications, and the cooperative data side is provided with a plurality of second data corresponding to the data identifications; the device comprises:
a parameter generation module for generating a first parameter and a second parameter corresponding to each data identifier; if the first data corresponding to the data identification do not participate in data statistics, a second parameter is equal to the first parameter, otherwise, the second parameter is obtained by calculation according to the first parameter and the first data;
the data sending module is used for sending each data identifier and the first parameter and the second parameter corresponding to the data identifier to the cooperative data party;
the data receiving module is used for receiving a partner calculation value returned by a partner data party, wherein the partner calculation value is obtained by the partner data party according to the selected first parameter or the selected second parameter, if the second data corresponding to the data identification participates in data statistics, the partner data party selects the second parameter, and if not, the partner data party selects the first parameter;
and the statistical processing module is used for removing the calculated value of each first parameter from the calculated value of the partner to obtain the statistical value.
6. The apparatus of claim 5, wherein the first and second electrodes are disposed in a common plane,
the parameter generation module is used for calculating a second parameter according to a first parameter and the first data, and specifically is used for performing summation statistics on the first parameter and the first data to obtain the second parameter;
the statistical processing module is used for subtracting the sum of each first parameter from an accumulated value when the calculated value of each first parameter is removed from the calculated value of the partner, wherein the accumulated value is obtained by the partner through accumulation according to the selected first parameter or second parameter.
7. The apparatus of claim 6, wherein the first and second electrodes are disposed on opposite sides of the substrate,
the parameter generating module, when configured to sum and count the first parameter and the first data to obtain a second parameter, is specifically configured to: if the first data corresponding to the data identifier meets a data filtering condition for determining participation statistical data, when the statistical value is the sum of a plurality of first data, the first parameter is a random number, and the second parameter is the sum of the random number and the first data.
8. A data statistics device is used for performing data statistics between a local data side and a statistical data side, wherein the statistical data side is provided with a plurality of first data of statistical values to be calculated, the plurality of first data correspond to different data identifications respectively, and the local data side is provided with second data corresponding to the same data identification; the device comprises:
the parameter receiving module is used for receiving the data identifier sent by the statistical data party and the first parameter and the second parameter corresponding to the data identifier; when first data corresponding to the data identification participate in data statistics, the second parameter is obtained by calculation according to the first parameter and the first data, otherwise, the second parameter is equal to the first parameter;
the parameter selection module is used for selecting a second parameter corresponding to the data identifier if the second data corresponding to the data identifier is data participating in data statistics locally; otherwise, selecting a first parameter corresponding to the data identifier;
the statistical calculation module is used for carrying out statistical calculation according to the selected first parameter and the selected second parameter to obtain a partner calculation value;
and the numerical value sending module is used for sending the partner calculation value to the statistical data party so that the statistical data party removes the calculation value of each first parameter according to the partner calculation value to obtain the statistical value.
9. A data statistics apparatus, the apparatus comprising a memory, a processor, and computer instructions stored on the memory and executable on the processor, the processor when executing the instructions implementing the steps of:
generating a first parameter and a second parameter corresponding to each data identifier; if the first data corresponding to the data identification do not participate in data statistics, a second parameter is equal to the first parameter, otherwise, the second parameter is obtained by calculation according to the first parameter and the first data;
sending each data identifier, and a first parameter and a second parameter corresponding to the data identifier to a cooperative data party;
receiving a partner calculation value returned by a partner data party, wherein the partner calculation value is obtained by the partner data party according to the selected first parameter or second parameter, if the second data corresponding to the data identification participates in data statistics, the partner data party selects the second parameter, otherwise, the partner data party selects the first parameter;
and removing the calculated value of each first parameter from the calculated value of the partner to obtain a statistical value.
10. A data statistics apparatus, the apparatus comprising a memory, a processor, and computer instructions stored on the memory and executable on the processor, the processor when executing the instructions implementing the steps of:
receiving a data identifier sent by a data statistic party, and a first parameter and a second parameter corresponding to the data identifier; when first data corresponding to the data identification participate in data statistics, the second parameter is obtained by calculation according to the first parameter and the first data, otherwise, the second parameter is equal to the first parameter;
if the second data corresponding to the data identification is data participating in data statistics locally, selecting a second parameter corresponding to the data identification; otherwise, selecting a first parameter corresponding to the data identifier;
performing statistical calculation according to the selected first parameter and the selected second parameter to obtain a partner calculation value;
and sending the partner calculation value to the statistical data party so that the statistical data party removes the calculation value of each first parameter according to the partner calculation value to obtain a statistical value.
CN201711046886.3A 2017-10-31 2017-10-31 Data statistical method and device Active CN109726363B (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN201711046886.3A CN109726363B (en) 2017-10-31 2017-10-31 Data statistical method and device
TW107130573A TWI689828B (en) 2017-10-31 2018-08-31 Data statistics method and device
PCT/CN2018/105482 WO2019085656A1 (en) 2017-10-31 2018-09-13 Data statistics method and apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711046886.3A CN109726363B (en) 2017-10-31 2017-10-31 Data statistical method and device

Publications (2)

Publication Number Publication Date
CN109726363A CN109726363A (en) 2019-05-07
CN109726363B true CN109726363B (en) 2020-05-29

Family

ID=66294427

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711046886.3A Active CN109726363B (en) 2017-10-31 2017-10-31 Data statistical method and device

Country Status (3)

Country Link
CN (1) CN109726363B (en)
TW (1) TWI689828B (en)
WO (1) WO2019085656A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116108494B (en) * 2023-04-12 2023-06-20 蓝象智联(杭州)科技有限公司 Multiparty joint data statistics method for protecting privacy

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103562851A (en) * 2011-05-27 2014-02-05 国际商业机器公司 Data perturbation and anonymization using one-way hash
CN107078899A (en) * 2015-03-26 2017-08-18 华为国际有限公司 The method of obfuscated data
CN107291764A (en) * 2016-04-05 2017-10-24 中兴通讯股份有限公司 A kind of big data exchange method and device, system

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI370660B (en) * 2009-02-24 2012-08-11 Ind Tech Res Inst Method and system for coding/decoding, and encryption/decryption method used therein
WO2011068996A1 (en) * 2009-12-04 2011-06-09 Cryptography Research, Inc. Verifiable, leak-resistant encryption and decryption
CN102594889B (en) * 2012-02-17 2014-07-16 广东电网公司电力科学研究院 Data-call-based data synchronization and analysis system
US10949473B2 (en) * 2014-05-21 2021-03-16 Knowledge Syntheses Systems and method for searching and analyzing big data
CN105023086A (en) * 2015-01-07 2015-11-04 泰华智慧产业集团股份有限公司 Digital city management data sharing system based on cloud calculation
CN105430055A (en) * 2015-11-02 2016-03-23 武大吉奥信息技术有限公司 Large data exchange system based on distributed and multi-level junction

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103562851A (en) * 2011-05-27 2014-02-05 国际商业机器公司 Data perturbation and anonymization using one-way hash
CN107078899A (en) * 2015-03-26 2017-08-18 华为国际有限公司 The method of obfuscated data
CN107291764A (en) * 2016-04-05 2017-10-24 中兴通讯股份有限公司 A kind of big data exchange method and device, system

Also Published As

Publication number Publication date
WO2019085656A1 (en) 2019-05-09
TWI689828B (en) 2020-04-01
TW201918910A (en) 2019-05-16
CN109726363A (en) 2019-05-07

Similar Documents

Publication Publication Date Title
CN109726580B (en) Data statistical method and device
CN110457912B (en) Data processing method and device and electronic equipment
CN110427969B (en) Data processing method and device and electronic equipment
Jaidka et al. The 2014 Indian general election on Twitter: an analysis of changing political traditions
CN110460435B (en) Data interaction method and device, server and electronic equipment
JP2012113712A (en) Method, system, and computer program product for user-defined system-enforced session termination in unified telephony environment
CN108334494B (en) Method and device for constructing user relationship network
CN110457936A (en) Data interactive method, device and electronic equipment
CN109934709A (en) Data processing method, device and server based on block chain
CN109726363B (en) Data statistical method and device
CN110781153A (en) Cross-application information sharing method and system based on block chain
CN110874481B (en) GBDT model-based prediction method and GBDT model-based prediction device
CN109726581B (en) Data statistical method and device
CN113518317B (en) Method and device for sending prompt information, storage medium and electronic device
CN112232639B (en) Statistical method, statistical device and electronic equipment
CN110851487A (en) Data statistical method and device
CN109669956B (en) Memory, user relationship determination method, device and equipment
CN112035241A (en) Task processing method and device
Rana et al. The strength of social strength: an evaluation study of algorithmic versus user-defined ranking
Trippi " Technology has given politics back its soul": a longtime political operative cheers the innovations of Obama 2012, saying they restored the primacy of the individual voter
CN110825922B (en) Data statistical method and device
CN111339120B (en) Short message approval number generation method, processing method, electronic equipment and storage medium
CN117494150A (en) Data processing method and device, electronic equipment and storage medium
JPWO2017022207A1 (en) User information estimation system, user information estimation method, and user information estimation program
CN115103076A (en) Outbound method, device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20200923

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Patentee after: Innovative advanced technology Co.,Ltd.

Address before: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Patentee before: Advanced innovation technology Co.,Ltd.

Effective date of registration: 20200923

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Patentee after: Advanced innovation technology Co.,Ltd.

Address before: A four-storey 847 mailbox in Grand Cayman Capital Building, British Cayman Islands

Patentee before: Alibaba Group Holding Ltd.