WO2019085656A1

WO2019085656A1 - Data statistics method and apparatus

Info

Publication number: WO2019085656A1
Application number: PCT/CN2018/105482
Authority: WO
Inventors: 王华忠
Original assignee: 阿里巴巴集团控股有限公司
Priority date: 2017-10-31
Filing date: 2018-09-13
Publication date: 2019-05-09
Also published as: TWI689828B; CN109726363A; CN109726363B; TW201918910A

Abstract

Provided are a data statistics method and apparatus, the method comprising: generating a first parameter and a second parameter to correspond to each data identifier; if a piece of first data corresponding to a data identifier does not participate in data statistics, the second parameter is equal to the first parameter, and otherwise, the second parameter is calculated according to the first parameter and the piece of first data; sending each data identifier and the corresponding first parameter and second parameter to a cooperative data party; receiving cooperative party calculation values returned by the cooperative data party, the cooperative party calculation values being obtained by the cooperative data party according to selected first parameters or second parameters; removing calculation values of various first parameters from the cooperative party calculation values, and obtaining required statistical values.

Description

Data statistics method and device

Technical field

The present disclosure relates to the field of network technologies, and in particular, to a data statistics method and apparatus.

Background technique

In the era of big data, there are many data islands. For example, a natural person's data can be distributed and stored in different enterprises, and the business and enterprise are not completely mutual trust due to competition and user privacy protection. This is the statistics involving data cooperation between enterprises. Work creates obstacles. Under the premise of fully protecting the core data privacy of enterprises, it is possible to use the data owned by both parties to complete some statistical calculations without revealing the privacy of each company's data, which becomes an urgent problem to be solved urgently. But there is currently no good solution.

Summary of the invention

In view of this, the present disclosure provides a data statistics method and apparatus for implementing two-party secure computing on the basis of protecting the data privacy of two data owners.

Specifically, one or more embodiments of the present specification are implemented by the following technical solutions:

In a first aspect, a data statistics method is provided, where the method is applied to perform data statistics by combining data of a local data party and a cooperative data party, where the local data party has a plurality of first data to be calculated, and the plurality of The data corresponds to different data identifiers, and the cooperative data party has multiple second data corresponding to the data identifier, and the method includes:

Corresponding to each data identifier, generating a first parameter and a second parameter; if the first data corresponding to the data identifier does not participate in data statistics, the second parameter is equal to the first parameter, otherwise, the second parameter is according to the first parameter Calculating a parameter and the first data;

Sending each data identifier, and the first parameter and the second parameter corresponding to the data identifier, to the cooperative data party;

Receiving the partner calculation value returned by the cooperation data party, where the partner calculation value is obtained by the cooperation data party according to the selected first parameter or the second parameter, and if the second data corresponding to the data identifier participates in the data statistics, the cooperation data is obtained. The party selects the second parameter, otherwise, the cooperative data party selects the first parameter;

The calculated value of each first parameter is removed from the calculated value of the partner to obtain the statistical value.

In a second aspect, a data statistics method is provided, where the method is used for performing data statistics between a local data party and a statistical data party, where the statistical data party has a plurality of first data to be calculated, and the The first data corresponds to different data identifiers, and the local data party has second data corresponding to the same data identifier; the method includes:

Receiving the data identifier sent by the statistical data side, and the first parameter and the second parameter corresponding to the data identifier; wherein, when the first data corresponding to the data identifier participates in the data statistics, the second parameter is based on Calculating the first parameter and the first data, otherwise, the second parameter is equal to the first parameter;

And if the second data corresponding to the data identifier is data that is locally involved in data statistics, selecting a second parameter corresponding to the data identifier; otherwise, selecting a first parameter corresponding to the data identifier;

Performing statistical calculation according to the selected first parameter and the second parameter to obtain a calculated value of the partner;

And sending the partner calculation value to the statistical data side, so that the statistical data side removes the calculated value of each first parameter according to the partner calculation value, and obtains the statistical value.

In a third aspect, a data statistics apparatus is provided, where the apparatus is configured to perform data statistics by combining data of a local data party and a cooperative data party, where the local data party has a plurality of first data to be calculated, and the multiple The first data respectively correspond to different data identifiers, and the cooperation data party has a plurality of second data corresponding to the data identifiers; the device includes:

a parameter generating module, configured to generate a first parameter and a second parameter corresponding to each data identifier; if the first data corresponding to the data identifier does not participate in data statistics, the second parameter is equal to the first parameter, otherwise, The second parameter is calculated according to the first parameter and the first data;

a data sending module, configured to send each data identifier, and the first parameter and the second parameter corresponding to the data identifier, to the collaborative data party;

a data receiving module, configured to receive a partner calculation value returned by the cooperation data party, where the partner calculation value is obtained by the cooperation data party according to the selected first parameter or the second parameter, if the second data corresponding to the data identifier participates For data statistics, the cooperative data party selects the second parameter; otherwise, the cooperative data party selects the first parameter;

And a statistical processing module, configured to remove the calculated value of each first parameter from the calculated value of the partner, to obtain the statistical value.

According to a fourth aspect, a data statistics apparatus is provided, where the apparatus is configured to perform data statistics between a local data party and a statistical data side, where the statistical data side has a plurality of first data to be calculated, and the multiple The first data respectively correspond to different data identifiers, and the local data party has second data corresponding to the same data identifier; the device includes:

a parameter receiving module, configured to receive a data identifier sent by the statistic data, and a first parameter and a second parameter corresponding to the data identifier; where, when the first data corresponding to the data identifier participates in data statistics, The second parameter is calculated according to the first parameter and the first data; otherwise, the second parameter is equal to the first parameter;

a parameter selection module, configured to: if the second data corresponding to the data identifier is data that is locally involved in data statistics, select a second parameter corresponding to the data identifier; otherwise, select a first parameter corresponding to the data identifier;

a statistical calculation module, configured to perform statistical calculation according to the selected first parameter and the second parameter, to obtain a calculated value of the partner;

And a value sending module, configured to send the partner calculation value to the statistical data side, so that the statistical data side removes the calculated value of each first parameter according to the partner calculation value, to obtain the statistical value.

In a fifth aspect, a data statistics device is provided, the device comprising a memory, a processor, and computer instructions stored on the memory and executable on the processor, the processor implementing the instructions to:

Sending each data identifier and the first parameter and the second parameter corresponding to the data identifier to the cooperation data party;

In a sixth aspect, a data statistics device is provided, the device comprising a memory, a processor, and computer instructions stored on the memory and executable on the processor, the processor implementing the instructions to:

The data statistics method and apparatus of one or more embodiments of the present specification can make the cooperative data party not know when transmitting the parameters to the cooperative data party by generating the first parameter and the second parameter for confusing the real data. The real data of the local end, and the calculated value of the partner returned by the cooperative data side is also determined according to the data filtering condition of the cooperative data side, and the local end does not know the data selection made by the cooperative data side, thereby realizing the protection of the two data. Based on the data privacy of the owner, the two-party data is jointly calculated by the two parties.

DRAWINGS

In order to more clearly illustrate one or more embodiments of the present specification or the technical solutions in the prior art, the drawings used in the embodiments or the description of the prior art will be briefly described below. Obviously, in the following description The drawings are only some of the embodiments described in one or more embodiments of the present specification, and those skilled in the art can obtain other drawings according to the drawings without any inventive labor. .

1 is a flowchart of a data statistics method provided by one or more embodiments of the present specification;

2 is a flow chart of data summation statistics provided by one or more embodiments of the present specification;

FIG. 3 is a schematic structural diagram of a data statistics apparatus according to one or more embodiments of the present disclosure; FIG.

FIG. 4 is a schematic structural diagram of a data statistics apparatus provided by one or more embodiments of the present specification.

Detailed ways

In order to make those skilled in the art better understand the technical solutions in one or more embodiments of the present specification, in the following one or more embodiments of the present specification, in one or more embodiments of the present specification, The technical solutions are described clearly and completely, and it is obvious that the described embodiments are only a part of the embodiments, rather than all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on one or more embodiments of the present disclosure without departing from the inventive scope are intended to be within the scope of the disclosure.

In the era of big data, data can be stored in a vertical mode, that is, multiple data owners can have different attribute information of the same entity. For example, the same natural person’s car insurance is divided into one institution, and the natural person’s claim amount is in another. mechanism. This vertical mode of data storage may result in multiple data owners involved in some statistical calculations, and multiple data owners need to cooperate to complete a data statistics. However, due to the competitive relationship or privacy protection between different companies, the company's data secrets cannot be disclosed.

In the example of the present disclosure, it is intended to perform data statistics based on data of different data owners, without revealing the data privacy of the data owners. The method will be described in detail by taking an example application scenario as an example.

Application scenario:

In one example, there can be two data sources: data source A and data source B. Assume that data source A can be a data organization, and data source B can be an insurance institution. These two data sources can store different information of the same owner.

Data Source A: Assume that the data source A can store the car insurance score of each car owner. The car insurance score can be the score obtained by performing accurate portrait and risk analysis on the car owner. The higher the car insurance score, the lower the risk. As shown in Table 1, the data structure of the data source A side to store the car insurance points is as follows:

Table 1 Data structure of data source A

列名Column name	类型Types of	说明Description	示例Example
idcard_noIdcard_no	stringString	身份证号identity number	****197309119564****197309119564
scoreScore	intInt	车险分Car insurance	510510

Data Source B: Assume that the data source B can store the claim information of each owner. For example, the claim information of the owner may include the number of claims, the amount of the claim, and the like. As shown in Table 2, an example of the data structure of each owner stored on the data source B side is as follows:

Table 2 Data structure of data source B

Based on the application scenario described above, the data statistics processing can be completed jointly based on the data of the data source A and the data source B. For example, the demand for statistical work can be “the sum of the number of claims for female users with a statistical risk of more than 500 points.” Then, the “auto insurance score greater than 500 points” needs to be determined based on the data of data source A. “Female users, number of claims” These data are stored in data source B. Therefore, this statistical work requires data cooperation between data source A and data source B.

In the description of the data statistics method in one or more embodiments of the present specification, the data source having the statistical data may be referred to as a statistical data side, and the other data source may be referred to as a cooperative data side. For example, in the statistical work "the total number of claims of female users whose statistical risk is greater than 500 points", the "number of claims" is statistical data, so data source B is the statistical data side, then data source A is the cooperative data side.

The statistical data party and the cooperative data party may separately store different information of the same owner, and the vehicle owner information (for example, the number of claims) stored in the statistical data party to be participated in the statistics may be referred to as first data, and stored in the cooperative data party. The owner information (for example, the car insurance score) participating in the statistics is called the second data. In addition, the ID number idcard_no included in both the data source A and the data source B may be referred to as a data identifier, and the statistical data side (eg, the data source B) may store the first data corresponding to the data identifier, and the cooperative data side (for example, The data source A) can store the second data corresponding to the same data identifier.

Figure 1 illustrates a flow of a statistical method of data, which may include:

In step 100, the statistical data side generates a first parameter and a second parameter corresponding to each data identifier.

For example, the first parameter may be a random number, or the first parameter may also be a value calculated from a random number, such as one-half of a random number.

For example, the value of the second parameter may be determined according to the data filtering condition. If the first data corresponding to the data identifier satisfies the local data filtering condition and is data participating in the data statistics, the first parameter and the first data may be calculated according to the first parameter and the first data. The second parameter. For example, the first parameter and the first data may be summed to obtain a second parameter. If the second data corresponding to the data identifier does not satisfy the local data filtering condition, the second parameter may be set to be equal to the first parameter. However, in actual implementation, the manner in which the second parameter is generated is not limited to the manner in which the first data and the first parameter are summed, and other calculation methods may be used.

In step 102, the statistic data party sends the local data identifier and the first parameter and the second parameter corresponding to the data identifier to the cooperation data party.

In step 104, the cooperation data party selects a parameter, and if the second data corresponding to the data identifier is the data of the local participation data statistics, the second parameter corresponding to the data identifier is selected; otherwise, the first parameter corresponding to the data identifier is selected.

For example, after receiving the data identifier sent by the statistic party and the first parameter and the second parameter corresponding to the data identifier, the cooperative data party may perform parameter selection in this step, and the selected parameter may participate in the processing of the subsequent step 106.

The cooperative data party may select the second parameter according to the local data filtering condition. If the second data corresponding to the data identifier satisfies the filtering condition and is the data participating in the data statistics, the second parameter may be selected; otherwise, if the data identifier corresponds to the second If the data is not filtered, not the data participating in the statistics, the first parameter can be selected.

In step 106, the cooperative data party performs statistical calculation on the selected first parameter and the second parameter to obtain a partner calculated value. For example, when the statistic value to be obtained is the summation statistic, the selected first parameter and the second parameter may be added; of course, in other statistical methods, the first parameter and the second parameter may also be corresponding. Other forms of calculation.

In step 108, the cooperating data party sends the partner calculation value to the statistical data party.

In step 110, the statistical data side uses the calculated value of the partner to remove the calculated value of the first parameter, and obtains a statistical value. For example, it is possible to subtract the sum of the respective first parameters from the partner calculation value.

The above example of the process of FIG. 1 adopts an Oblivious Transfer (OT), which is a privacy-protected two-party communication protocol, which enables the communication parties to transmit messages in a manner that is fuzzified in selection, so that the service can be made. The recipient receives the message entered by the service sender in an unintended manner, thus protecting the recipient's privacy from being known by the sender.

For example, in the example of FIG. 1, the statistic data party can send all the data identifiers and the corresponding first parameter and the second parameter to the cooperative data party. In fact, the statistic data party has set the second parameter according to the local data filtering condition. Different values, but from the perspective of the cooperating data side, all data identifiers are received, and the filtering data of the statistics side is not leaked. Furthermore, the statistical data side confuses its own real data by means of two parameters, and the first parameter and the second parameter transmitted to the cooperative data side are not the real first data, nor will the data privacy leak. Moreover, from the perspective of the statistical data side, the calculated value of the partner it receives is the data-filtered selection of the cooperative data party, but the statistical data side cannot distinguish which data is selected by the cooperative data party. Therefore, the cooperation data The party's data can also be protected by privacy.

Based on the data structure shown in Table 1, it is assumed that the data of the car insurance belonging to the data source A is as shown in Table 3 below, wherein idcard_no can be the ID number of the owner, and the score can be the car insurance score of the owner.

Table 3 Data source A data

idcard_noIdcard_no	scoreScore
12345671234567	490490
23456782345678	501501
34567893456789	530530

Based on the data structure shown in Table 2, assume that the data owned by data source B is as follows:

Table 4 Data source B data

idcard_noIdcard_no	genderGender	timesTimes	amountAmount
12345671234567	男male	33	50005000
23456782345678	女Female	77	2300023000
34567893456789	女Female	66	1600016000

Based on the above Tables 3 and 4, the total number of claims of female users with a car insurance score greater than 500 is counted. It can be seen that the statistical data "claims" of this statistical work is stored in data source B. The times column in Table 4 can be called "statistical column", that is, the data of this column is summed and counted. The "car risk score greater than 500 points" in the filter condition is located in the data source A (the second data is used as the filter condition for the statistical value acquisition), and the filter condition "female" is located in the data source B, that is, the filter condition can be in the two data sources. All exist. Data source A and data source B perform data cooperation, which can achieve the statistical summation of the number of claims (acquisition of statistical values).

FIG. 2 illustrates a flow of summation statistics combined with data source A and data source B, which may include:

In step 200, data source B generates a random number for each row of data and generates M0 and M1 based on the data filtering conditions.

In this step, for example, the data shown in Table 4, the column corresponding to the number of claims times is a statistical column. Among them, 3, 7, and 6 are the first data in the statistical column.

For a random number generated for each row of data, assume that the random number corresponding to 1234567 is t1, the random number corresponding to 2345678 is t2, and the random number corresponding to 3456789 is t3.

According to the local data filtering condition "female user", the owner of the two idcard_no of 2345678 and 3456789 can meet the condition and participate in the first data of this data statistics; while the owner of 1234567 does not meet the filtering conditions and does not participate in the data statistics. . Accordingly, assuming that each of the first data in the statistical column is represented by b, then the first parameter and the second parameter corresponding to each idcard_no can be generated. The first parameter may be the random number corresponding to each idcard_no, and the second parameter may be the sum of the first data corresponding to the idcard_no, and the first data may be b participating in the statistics.

As shown in the following example of Table 5, a random number is generated for each row of data, assuming that the true value of the corresponding statistical column is b. Traversing each row of data, if the row of data satisfies its own filtering condition, it generates M0=t, M1=t+b; if it does not satisfy its own filtering condition, it generates M0=M1=t.

Table 5 MO and M1 of each row of data

idcard_noIdcard_no	M0M0	M1M1
12345671234567	t ₁ t ₁	t ₁ t ₁
23456782345678	t ₂ t ₂	t ₂+7 t ₂ +7
34567893456789	t ₃ t ₃	t ₃+6 t ₃ +6

The M0 and M1 generated in this step are to confuse the real statistical column data by the generation of the random value. Even if the cooperative data party receives the M0 and M1 corresponding to the idcard_no, the real statistical column data corresponding to the data identifier idcard_no cannot be known. how many. For example, even if t ₂ and t ₂ +7 corresponding to the data identifier 2345678 are received, the true value 7 of b cannot be known.

Furthermore, the above-mentioned random numbers t ₁ , t ₂ and t ₃ respectively corresponding to each data identifier may be different.

In step 202, the data source B sends the data identifier of each row of data and the MO and M1 corresponding to the data identifier to the data source A.

In step 204, the data source A selects M1 according to the local data filtering condition, and if the second data corresponding to the data identifier participates in the data statistics, otherwise selects MO.

For example, the data source A can determine whether the second data (score in the table 3) corresponding to each data identifier idcard_no is greater than 500 points according to the filter condition “the vehicle risk score is greater than 500 points”. If the score corresponding to idcard_no is greater than 500, "t+b" in Table 5 is selected. Otherwise, if the score corresponding to idcard_no is less than 500, "t" in Table 5 is selected.

For example, if idcard_no is 1234567, the data identification corresponding to the car insurance score is 490, and the filtering condition of "the car insurance score is greater than 500 points" is not satisfied. Then, M0 corresponding to 1234567 in Table 5 can be selected, that is, t1 is selected. For another example, the idcard_no is 2345678 as an example. In Table 3, the corresponding vehicle risk score is 501, and the filter condition that satisfies the “auto insurance score greater than 500 points” may be selected, and the M1 corresponding to 2345678 in Table 5 may be selected. T2+7. Similarly, for idcard_no is 3456789, t3+6 will be selected.

In step 206, data source A accumulates the selected numbers to obtain an accumulated value.

For example, data source A can accumulate selected parameters to obtain an accumulated value. For example, the accumulated value can be M=t1+t2+7+t3+6. The accumulated value is the calculated value of the partner.

In step 208, data source A sends the accumulated value to data source B.

In step 210, data source B subtracts the sum of M0 from the accumulated value to obtain a statistical value.

In this step, after receiving the accumulated value, the data source B subtracts the sum of all the random numbers MO from the accumulated value, and the obtained is the sum of the number of claims to be counted. For example, M–(t1+t2+t3)=13 can be calculated, which is the final statistical value, where M is the accumulated value.

In this example, after the data source B receives the accumulated value, it cannot know whether the data source A side specifically selects M0 or M1, but only receives an accumulated value; likewise, the data source A cannot know the data source B side filtering. Participate in the statistics, but only receive two parameters. Therefore, this method does not disclose the detailed data of either party in the calculation process, and efficiently completes the summation statistics of both parties.

The flow shown in FIG. 2 above is an example in which the statistical value is the sum of the plurality of first data, for example, the sum of the number of claims is obtained. In other examples, the data statistics method of one or more embodiments of the present disclosure may also be applied to other statistical calculation scenarios. For example, the statistical value may also be an average value of multiple first data.

For example, taking the average value of the number of claims of female users with a car insurance score greater than 500 points, the processing flow shown in FIG. 2 can also be adopted, except that different first parameters and second parameters can be adopted. For example, when a row of data does not satisfy its own filtering condition, the first parameter and the second parameter generated by the corresponding data identifier may be M0=M1=t; and when one row of data satisfies its own filtering condition, the corresponding data identifier is generated. The one parameter and the second parameter may be the first parameter plus one-half of the first data.

For example, taking the data identifier 2345678 in Table 5 as an example, the generated M0 may be t ₂ , and the generated M1 may be “t ₂ +7/2”. Alternatively, the first parameter may be generated as one-half of a random number, such as “t ₂ /2”, and the corresponding second parameter may be “(t ₂ +7)/2”. As shown in Table 6 below:

Table 6 MO and M1 when statistical average

idcard_noIdcard_no	M0M0	M1M1
12345671234567	t ₁ t ₁	t ₁ t ₁
23456782345678	t ₂ t ₂	t ₂+7/2 t ₂ +7/2
34567893456789	t ₃ t ₃	t ₃+6/2 t ₃ +6/2

After the data source B receives the accumulated value M sent by the data source A, it is assumed that the data source A selects the last two rows of data (corresponding to the data identifiers 2345678 and 3456789), and may still be M–(t1+t2+t3)=6.5. .

In order to implement the above method, one or more embodiments of the present specification further provide a data statistics device. As shown in FIG. 3, the device may include: a parameter generating module 31, a data sending module 32, a data receiving module 33, and statistics. Processing module 34.

a parameter generating module 31, configured to generate a first parameter and a second parameter corresponding to each data identifier; if the first data corresponding to the data identifier does not participate in data statistics, the second parameter is equal to the first parameter, otherwise, The second parameter is calculated according to the first parameter and the first data;

The data sending module 32 is configured to send each data identifier, and the first parameter and the second parameter corresponding to the data identifier, to the cooperative data party;

The data receiving module 33 is configured to receive a partner calculation value returned by the cooperation data party, where the partner calculation value is obtained by the cooperation data party according to the selected first parameter or the second parameter, if the data identifier corresponds to the second data Participating in data statistics, the cooperative data party selects the second parameter; otherwise, the cooperative data party selects the first parameter;

The statistical processing module 34 is configured to remove the calculated value of each first parameter from the calculated value of the partner, to obtain the statistical value.

In one example, the plurality of first data are located in the same statistical column of the local data source.

In an example, when the second parameter is calculated according to the first parameter and the first data, the parameter generating module 31 is specifically configured to perform summation statistics by using the first parameter and the first data. Get the second parameter. The statistical processing module 34, when used to remove the calculated value of each first parameter from the calculated value of the partner, specifically for subtracting the sum of each of the first parameters by the accumulated value, the accumulated value is cooperation The data side is accumulated according to the selected first parameter or the second parameter.

In an example, the parameter generating module 31 is configured to: when the second parameter is used to obtain the second parameter by using the first parameter and the first data, if the first data corresponding to the data identifier is satisfied for determining Participating in the data filtering condition of the statistical data, when the statistical value is the sum of the plurality of first data, the first parameter is a random number, and the second parameter is the random number and the first data Sum.

In an example, the parameter generating module 31 is configured to: when the second parameter is used to obtain the second parameter by using the first parameter and the first data, if the first data corresponding to the data identifier is satisfied for determining Participating in the data filtering condition of the statistical data, when the statistical value is an average of the plurality of first data, the second parameter is the first parameter plus one-half of the first data.

In order to implement the above method, one or more embodiments of the present specification further provide a data statistics device. As shown in FIG. 4, the device may include: a parameter receiving module 41, a parameter selection module 42, a statistical calculation module 43, and a numerical value. Send module 44.

The parameter receiving module 41 is configured to receive a data identifier sent by the statistic data side, and a first parameter and a second parameter corresponding to the data identifier, where, when the first data corresponding to the data identifier participates in the data statistics, The second parameter is calculated according to the first parameter and the first data; otherwise, the second parameter is equal to the first parameter;

The parameter selection module 42 is configured to: if the second data corresponding to the data identifier is the data of the local participation data statistics, select the second parameter corresponding to the data identifier; otherwise, select the first parameter corresponding to the data identifier;

The statistical calculation module 43 is configured to perform statistical calculation according to the selected first parameter and the second parameter to obtain a calculated value of the partner;

The value sending module 44 is configured to send the partner calculation value to the statistical data side, so that the statistical data side removes the calculated value of each first parameter according to the partner calculation value, and obtains the statistical value. .

For the convenience of description, the above devices are described as being separately divided into various modules by function. Of course, the functions of the various modules may be implemented in one or more software and/or hardware when implementing one or more embodiments of the present specification.

The various steps in the flow shown in the above method embodiments are not limited to the order in the flowchart. In addition, the description of each step may be implemented in the form of software, hardware or a combination thereof, for example, a person skilled in the art may implement it in the form of software code, and may be a computer executable computer capable of implementing the logic function corresponding to the step. instruction. When implemented in software, the executable instructions can be stored in a memory and executed by a processor in the device.

For example, corresponding to the above method, one or more embodiments of the present specification simultaneously provide a data statistics device for performing data statistics in conjunction with data of a local data party and a cooperative data party, the local data party having statistics to be calculated. a plurality of first data of values, the plurality of first data respectively corresponding to different data identifiers, and the cooperative data side has a plurality of second data corresponding to the data identifiers. The apparatus can include a processor, a memory, and computer instructions stored on the memory and operative on the processor, the processor executing the instructions for implementing the steps of:

For example, corresponding to the above method, one or more embodiments of the present specification further provide a data statistics device, configured to perform data statistics between a local data party and a statistical data party, where the statistical data party has statistics to be calculated. a plurality of first data of values, the plurality of first data respectively corresponding to different data identifiers, wherein the local data parties have second data corresponding to the same data identifier. The apparatus can include a processor, a memory, and computer instructions stored on the memory and operative on the processor, the processor executing the instructions for implementing the steps of:

The apparatus or module illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product having a certain function. A typical implementation device is a computer, and the specific form of the computer may be a personal computer, a laptop computer, a cellular phone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email transceiver, and a game control. A combination of a tablet, a tablet, a wearable device, or any of these devices.

Those skilled in the art will appreciate that one or more embodiments of the present specification can be provided as a method, system, or computer program product. Thus, one or more embodiments of the present specification can take the form of an entirely hardware embodiment, an entirely software embodiment, or a combination of software and hardware. Moreover, one or more embodiments of the present specification can employ a computer program embodied on one or more computer usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) having computer usable program code embodied therein. The form of the product.

The computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device. The apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.

These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device. The instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.

It is also to be understood that the terms "comprises" or "comprising" or "comprising" or any other variations are intended to encompass a non-exclusive inclusion, such that a process, method, article, Other elements not explicitly listed, or elements that are inherent to such a process, method, commodity, or equipment. An element defined by the phrase "comprising a ..." does not exclude the presence of additional equivalent elements in the process, method, item, or device including the element.

One or more embodiments of the present specification can be described in the general context of computer-executable instructions executed by a computer, such as a program module. Generally, program modules include routines, programs, objects, components, data structures, and the like that perform particular tasks or implement particular abstract data types. One or more embodiments of the present specification may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are connected through a communication network. In a distributed computing environment, program modules can be located in both local and remote computer storage media including storage devices.

The various embodiments in the specification are described in a progressive manner, and the same or similar parts between the various embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the server device embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and the relevant parts can be referred to the description of the method embodiment.

The foregoing description of the specific embodiments of the specification has been described. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than the embodiments and still achieve the desired results. In addition, the processes depicted in the figures are not necessarily in a particular order or in a sequential order to achieve the desired results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

The above description is only a preferred embodiment of one or more embodiments of the present specification, and is not intended to limit the disclosure, and any modifications, equivalents, improvements, etc., made within the spirit and principles of the present disclosure. All should be included in the scope of protection of the present disclosure.

Claims

A data statistics method, the method is applied to perform data statistics by combining data of a local data party and a cooperative data party, where the local data party has a plurality of first data to be calculated, and the plurality of first data respectively correspond to different data. The data identifier, the cooperation data party has a plurality of second data corresponding to the data identifier, and the method includes:

Corresponding to each data identifier, generating a first parameter and a second parameter; if the first data corresponding to the data identifier does not participate in data statistics, the second parameter is equal to the first parameter, otherwise, the second parameter is according to the first parameter Calculating a parameter and the first data;

Sending each data identifier and the first parameter and the second parameter corresponding to the data identifier to the cooperation data party;

Receiving the partner calculation value returned by the cooperation data party, where the partner calculation value is obtained by the cooperation data party according to the selected first parameter or the second parameter, and if the second data corresponding to the data identifier participates in the data statistics, the cooperation data is obtained. The party selects the second parameter, otherwise, the cooperative data party selects the first parameter;

The calculated value of each first parameter is removed from the calculated value of the partner to obtain the statistical value.
The method of claim 1

The second parameter is calculated according to the first parameter and the first data, and includes:

The second parameter is obtained by performing summation statistics by the first parameter and the first data;

And the calculated value of each of the first parameters is removed from the calculated value of the partner, including:

The partner calculation value is an accumulated value obtained by the cooperation data party according to the selected first parameter or the second parameter, and the sum of each of the first parameters is subtracted by the accumulated value.
The method of claim 2,

The second parameter is obtained by summing and counting the first parameter and the first data, and includes:

If the first data corresponding to the data identifier meets the data filtering condition for determining the participation statistical data, when the statistical value is the sum of the plurality of first data, the first parameter is a random number, The second parameter is the sum of the random number and the first data.
The method of claim 2,

The second parameter is obtained by summing and counting the first parameter and the first data, and includes:

And if the first data corresponding to the data identifier meets a data filtering condition for determining the participation statistical data, when the statistical value is an average value of the plurality of first data, the second parameter is the first parameter Plus one-half of the first data.
A data statistics method, the method is used for performing data statistics between a local data party and a statistical data party, wherein the statistical data party has a plurality of first data to be calculated, and the plurality of first data respectively Corresponding to different data identifiers, the local data party has second data corresponding to the same data identifier; the method includes:

Receiving the data identifier sent by the statistical data side, and the first parameter and the second parameter corresponding to the data identifier; wherein, when the first data corresponding to the data identifier participates in the data statistics, the second parameter is based on Calculating the first parameter and the first data, otherwise, the second parameter is equal to the first parameter;

And if the second data corresponding to the data identifier is data that is locally involved in data statistics, selecting a second parameter corresponding to the data identifier; otherwise, selecting a first parameter corresponding to the data identifier;

Performing statistical calculation according to the selected first parameter and the second parameter to obtain a calculated value of the partner;

And sending the partner calculation value to the statistical data side, so that the statistical data side removes the calculated value of each first parameter according to the partner calculation value, and obtains the statistical value.
A data statistics device, configured to perform data statistics by combining data of a local data party and a cooperative data party, wherein the local data party has a plurality of first data to be calculated, and the plurality of first data respectively Corresponding to different data identifiers, the cooperation data party has a plurality of second data corresponding to the data identifiers; the device includes:

a parameter generating module, configured to generate a first parameter and a second parameter corresponding to each data identifier; if the first data corresponding to the data identifier does not participate in data statistics, the second parameter is equal to the first parameter, otherwise, The second parameter is calculated according to the first parameter and the first data;

a data sending module, configured to send each data identifier, and the first parameter and the second parameter corresponding to the data identifier, to the collaborative data party;

a data receiving module, configured to receive a partner calculation value returned by the cooperation data party, where the partner calculation value is obtained by the cooperation data party according to the selected first parameter or the second parameter, if the second data corresponding to the data identifier participates For data statistics, the cooperative data party selects the second parameter; otherwise, the cooperative data party selects the first parameter;

And a statistical processing module, configured to remove the calculated value of each first parameter from the calculated value of the partner, to obtain the statistical value.
The device of claim 6

The parameter generating module is configured to: when used to calculate the second parameter according to the first parameter and the first data, to obtain a second parameter by performing summation statistics by using the first parameter and the first data ;

The statistical processing module, when used to remove the calculated value of each first parameter from the calculated value of the partner, is specifically used to subtract the sum of each of the first parameters by an accumulated value, where the accumulated value is The cooperative data side is accumulated according to the selected first parameter or the second parameter.
The device according to claim 7,

The parameter generating module is configured to: if the first data corresponding to the data identifier meets the requirement for determining the participating statistical data, when the second parameter is used to obtain the second parameter by the first parameter and the first data. Data filtering condition, when the statistical value is the sum of the plurality of first data, the first parameter is a random number, and the second parameter is a sum of the random number and the first data.
The device according to claim 7,

The parameter generating module is configured to: if the first data corresponding to the data identifier meets the requirement for determining the participating statistical data, when the second parameter is used to obtain the second parameter by the first parameter and the first data. The data filtering condition, when the statistical value is an average of the plurality of first data, the second parameter is the first parameter plus one-half of the first data.
A data statistics device, configured to perform data statistics between a local data party and a statistical data party, wherein the statistical data party has a plurality of first data to be calculated, and the plurality of first data respectively Corresponding to different data identifiers, the local data party has second data corresponding to the same data identifier; the device includes:

a parameter receiving module, configured to receive a data identifier sent by the statistic data, and a first parameter and a second parameter corresponding to the data identifier; where, when the first data corresponding to the data identifier participates in data statistics, The second parameter is calculated according to the first parameter and the first data; otherwise, the second parameter is equal to the first parameter;

a parameter selection module, configured to: if the second data corresponding to the data identifier is data that is locally involved in data statistics, select a second parameter corresponding to the data identifier; otherwise, select a first parameter corresponding to the data identifier;

a statistical calculation module, configured to perform statistical calculation according to the selected first parameter and the second parameter, to obtain a calculated value of the partner;

And a value sending module, configured to send the partner calculation value to the statistical data side, so that the statistical data side removes the calculated value of each first parameter according to the partner calculation value, to obtain the statistical value.
A data statistics device, the device comprising a memory, a processor, and computer instructions stored on the memory and executable on the processor, the processor implementing the instructions to:

Corresponding to each data identifier, generating a first parameter and a second parameter; if the first data corresponding to the data identifier does not participate in data statistics, the second parameter is equal to the first parameter, otherwise, the second parameter is according to the first parameter Calculating a parameter and the first data;

Sending each data identifier and the first parameter and the second parameter corresponding to the data identifier to the cooperation data party;

Receiving the partner calculation value returned by the cooperation data party, where the partner calculation value is obtained by the cooperation data party according to the selected first parameter or the second parameter, and if the second data corresponding to the data identifier participates in the data statistics, the cooperation data is obtained. The party selects the second parameter, otherwise, the cooperative data party selects the first parameter;

The calculated value of each first parameter is removed from the calculated value of the partner to obtain the statistical value.
A data statistics device, the device comprising a memory, a processor, and computer instructions stored on the memory and executable on the processor, the processor implementing the instructions to:

Receiving the data identifier sent by the statistical data side, and the first parameter and the second parameter corresponding to the data identifier; wherein, when the first data corresponding to the data identifier participates in the data statistics, the second parameter is based on Calculating the first parameter and the first data, otherwise, the second parameter is equal to the first parameter;

And if the second data corresponding to the data identifier is data that is locally involved in data statistics, selecting a second parameter corresponding to the data identifier; otherwise, selecting a first parameter corresponding to the data identifier;

Performing statistical calculation according to the selected first parameter and the second parameter to obtain a calculated value of the partner;

And sending the partner calculation value to the statistical data side, so that the statistical data side removes the calculated value of each first parameter according to the partner calculation value, and obtains the statistical value.