CN114254311A

CN114254311A - System and method for anonymously collecting data related to malware from a client device

Info

Publication number: CN114254311A
Application number: CN202011016106.2A
Authority: CN
Inventors: 安东·S·拉普什金; 德米特里·V·什莫伊洛夫; 安德烈·V·拉迪科夫; 安德烈·A·叶夫列莫夫
Original assignee: Kaspersky Lab AO
Current assignee: Kaspersky Lab AO
Priority date: 2020-09-24
Filing date: 2020-09-24
Publication date: 2022-03-29

Abstract

The present invention relates to a system and method for anonymously collecting data related to malware from a client device. The system includes a network node configured to (i) receive a first data structure from a client device, wherein the first data structure contains an identifier of the client device and encrypted data, the encrypted data including an identifier of a user of the client device and/or personal data of the user, and the encrypted data is encrypted by the client device using a public key of the client device, wherein the public key is provided to the client device by a separate certificate authority, (ii) transform the received first data structure by replacing the identifier of the client device with an anonymous identifier, and (iii) send the transformed first data structure containing the anonymous identifier and the encrypted data to a server.

Description

System and method for anonymously collecting data related to malware from a client device

Technical Field

The present invention relates generally to the field of information security, and more particularly to a system and method for anonymously collecting data related to malware from a client device.

Background

Legal changes around the world are forcing information security experts to seek new ways to manage data from personal electronic devices. For example, a law is signed in the russian federation whereby personally identifiable information of russian used by internet services must be kept in the russian realm; in switzerland, banks are also required not to allow user data to leave the federally government jurisdictions; and prohibiting the personally identifiable information from being kept in an open form in a plurality of countries. The solutions being developed should not make the work of the user of the computer system more difficult and they should be as transparent as possible to the user in their operation.

With the advent of General Data Protection Regulations (GDPR), the amount of personal Data that is kept in the network infrastructure and received from users, on parts of various services, tends to be minimized. There is a need to provide distributed storage and processing of data obtained from users without losing its uniqueness.

These principles pose difficulties in adopting cloud infrastructure in the enterprise and private sectors. A solution that would address these difficulties is needed.

Disclosure of Invention

The technical result of the present invention is to enable secure and anonymous collection of malware-related data from a client device at a server.

In one aspect, a method for anonymously collecting data related to malware from a client device comprises: receiving, by a network node, a first data structure from a client device, wherein the first data structure contains an identifier of the client device and encrypted data, the encrypted data comprising an identifier of a user of the client device and/or personal data of the user, and the encrypted data is encrypted by the client device using a public key of the client device, wherein the public key is provided to the client device by an independent certificate authority; transforming, by the network node, the received first data structure by replacing the identifier of the client device with an anonymous identifier and sending the transformed first data structure containing the anonymous identifier and the encrypted data to the server; receiving, by the server, the transformed first data structure from the network node; receiving, by the server, a second data structure from the client device, wherein the second data structure contains malware-related data obtained on the client device; and combining, by the server, the transformed first data structure with the second data structure and storing the combined data structure on the server without the server having access to and/or viewing (i) the identifier of the client device and (ii) the identifier of the user of the client device and/or the personal data of the user stored in the combined data structure.

In an aspect, the anonymous identifier comprises an encrypted identifier of the client device.

In an aspect, the client device is located in a first area network, the network node is located in a second area network different from the first area network, and the server is located in a third area network different from the first area network and the second area network.

In one aspect, the first regional network and the third regional network are located within different legal jurisdictions.

In one aspect, the data related to malware includes a hash of a malicious file.

In an aspect, the network node is not located in the same intranet as the server and the client device.

The foregoing brief summary of the exemplary aspects is provided to provide a basic understanding of the invention. This summary is not an extensive overview of all contemplated aspects, and is intended to neither identify key or critical elements of all aspects nor delineate the scope of any or all aspects of the invention. Its sole purpose is to present one or more aspects in a simplified form as a prelude to the more detailed description of the invention that is presented later. To the accomplishment of the foregoing, one or more aspects of the invention comprise the features hereinafter fully described and particularly pointed out in the claims.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate one or more exemplary aspects of the present invention and, together with the detailed description, serve to explain the principles and implementations of these exemplary aspects.

FIG. 1 illustrates a system for data routing in a client-server architecture.

FIG. 1A illustrates a system for data routing in a client-server architecture, including an authentication module.

Fig. 2 shows a variant of the method of data routing in a client-server architecture, in which the data structure is divided into substructures by the client.

Fig. 3 illustrates a variation of the method of data routing in a client-server architecture in performing a request, wherein sub-structures in a data structure are identified by an anonymization module.

Fig. 4 shows a variant of the method of data routing in a client-server architecture, in which substructures in the data structure are identified by the client.

Fig. 5 shows a variant of the method of data routing in a client-server architecture, in which the data structure is divided into substructures by the client, when executing requests.

Fig. 6A illustrates an exemplary aspect of a method of data routing in a client-server architecture when sending data (for the construction of statistics), wherein the data structure is divided into sub-structures by the client.

Fig. 6B illustrates exemplary aspects of a method of data routing in a client-server architecture in detecting a directed attack on a client based on information gathered by the method of fig. 6A.

FIG. 7 illustrates aspects of a method of data routing in a client-server architecture in performing a request, wherein sub-structures in a data structure are identified by an anonymization module.

FIG. 8 illustrates an aspect of a method of data routing in a client-server architecture, wherein substructures in a data structure are identified by a client.

Fig. 9 illustrates aspects of a method of data routing in a client-server architecture, where a data structure is divided into sub-structures by a client, in performing a request.

FIG. 10 illustrates an anonymous data exchange system in a client-server architecture.

Fig. 11 shows a variant of the data exchange method in a client-server architecture for obtaining data from a client to build statistics on the server side.

Figure 12 illustrates a variant method of data exchange used in performing a client request to a server.

Fig. 12A shows a variant of the data exchange method used when performing a client request to a server and comprising combining sub-structures.

FIG. 13 illustrates exemplary aspects of a data exchange method in performing a client request to a server.

FIG. 13A illustrates exemplary aspects of a data exchange method when a client's request to a server is performed in asynchronous mode.

FIG. 14 illustrates a variant method of sending critical data in a client-server architecture.

FIG. 14A illustrates an exemplary aspect of a method of sending critical data in a client-server architecture.

FIG. 15 shows a table of exemplary rules for throttling modules, in accordance with aspects of the present invention.

FIG. 16 illustrates a variant method of sending critical data in a client-server architecture using an authentication module.

FIG. 16A illustrates an exemplary aspect of a method of sending critical data in a client-server architecture using an authentication module.

FIG. 17 illustrates an example of a computer system upon which the disclosed systems and methods may be implemented, according to an exemplary aspect.

Detailed Description

Exemplary aspects are described herein in the context of systems, methods, and computer program products for anonymously collecting malware-related data from a client device. Those of ordinary skill in the art will realize that the following description is illustrative only and is not intended to be in any way limiting. Other aspects will readily suggest themselves to such skilled persons having the benefit of this disclosure. Reference will now be made in detail to implementations of the exemplary aspects as illustrated in the accompanying drawings. Wherever possible, the same reference numbers will be used throughout the drawings and the following description to refer to the same or like items.

Fig. 1 illustrates a system 100 for data routing in a client-server architecture. The system 100 includes a client 102, a server 104, and a network node 106 having an anonymization module 108. The server 104 may be part of a cloud infrastructure (not shown in the figures) and the client may be a user device. The node 106 with the anonymization module 108 may be located in a regional network 107 (i.e., regional network 2) that is different from the regional network in which the server is located (i.e., regional network 3) and not located in the same intranet as the server 104 or the client 102. As used herein, the area network 107 refers to a geographically dispersed network that unites computers at different points by communication means, the set of area networks forming a global network 109. In the context of the present invention, the different regional networks 107 are not only geographically separated, but are also located in different jurisdictions (i.e., may be subject to different regulations), so that in the context of the present invention, different regional networks may also include networks that join together nodes of a country (national networks). For example, in fig. 1, a regional network "1" is a network in the united states, a regional network "2" is a network in germany and/or the european union, and a regional network "3" is a network in the russian federation.

The global network 109 in fig. 1 is the entire local network 107, or a world wide network or the internet. In terms of GDPR, for example, the regional network of the RF where the server is located will be considered the regional network of the third party.

In a particular example, the regional network 107 of the node 106 with the anonymization module 108 is also different from the regional network of the client 102. The arrows in fig. 1 are drawn to originate from the Network and not from the client, because in general the external IP Address is visible, due to the use of internal Address hiding techniques, in particular proxy, Network Address Translation (NAT).

The client 102 may include a modification module 110, the modification module 110 configured to divide one or more data structures (e.g., create the one or more data structures for dispatching data from the client to the server) into substructures and select paths for the obtained substructures. The data structure is a collection of data values generated and maintained by the components of the system 100, including the client 102 and the server 104. Note that some data values in the data structure may be "personal data," and thus subject to data privacy policies and regulations. A sub-structure is a data structure that contains a subset of the data values from the original data structure. For example, data values in the data structure may include data submissions, user requests, data queries and/or query results, log data, state data of the application, records of user transaction(s), user-generated content, and other forms of data suitable for exchange in a client-server architecture. In some examples, the data structure may be an in-memory data structure (e.g., a linked list, a hash table, a tree, an array, a database record) or an on-disk data structure (e.g., a file, a blob). In other examples, the data structure may be one or more network data packets configured to transmit the data values contained herein from the client to the server. The data structures may be serialized in a text format, a structured format (e.g., extensible markup language or XML, JavaScript object notation, or JSON), or other format for information exchange.

There may be various criteria for dividing the data structure into sub-structures. One such criterion may be the presence of personal data (personal identification information) or a special class (in the terminology of GDPR) of personal data (personal identification information), whereby the data structure is partitioned such that one sub-structure contains personal data (hereinafter referred to as PD or PII) or a special class of personal data and another sub-structure contains data that is not personal data (i.e. the other sub-structure does not contain PD). The characterization and allocation of data as personal data may be indicated, for example, by national laws (in other words, according to the location of the data source) in the jurisdiction of which the user of the device, being a client in the described system, is located.

Another criterion for dividing the data structure into substructures is the presence of critical data. Critical data is data that is restricted by law or authorized entities to the collection, storage, access, dissemination and processing of it. Critical data is often sensitive to disclosure, dissemination and leakage because the occurrence of these events will result in infringement of the rights and legal protection benefits of legally protected users and accountability for those who have collected, stored, accessed and processed such data against violations of regulations. The specific case of critical data is confidential data (sensitive data) or personal data. Confidential data refers to data that is protected by national regulations under which the user of a device that is a client in the described system is located. The confidential data includes Personal Data (PD) and data containing the following items in specific cases: business secrets, tax secrets, bank secrets, medical secrets, notary secrets, attorney secrets, audit secrets, communication secrets, insurance secrets, advice secrets, receival secrets, confession secrets, clinical trial secrets, court-approval secrets, information about the person being protected, and national secrets. In one aspect, the critical data may include sensitive personal data as specified under the GDPR, which may be: revealing any data of ethnic or ethnic origin, political opinions, religious or philosophy beliefs, club members, data on health or sexual life and sexual orientation, and genetic or biometric data (e.g. for the purpose of uniquely identifying natural persons).

Anonymization module 108 is configured to perform the transformation and inverse transformation of substructures whose paths pass through node 106 with anonymization module 108. In one aspect, the transformation of the substructure may be a transformation of data contained in the substructure. In a specific example, the method of transformation of the data of the sub-structure may include one or more of: quantization, ordering, merging (gluing), grouping, data set configuration, table replacement of values, calculating values, data encoding, encryption, and normalization (scaling).

A particular kind of transformation may be applicable not only to personal data in the sub-structure, but also to the sub-structure as a whole, e.g. tokenization and/or encryption. In a specific example, the transformation is performed, but the inverse transformation cannot be performed by any means except the anonymization module 108 of the node. Inverse transformation refers to transformation that restores the original form of the transformed object (data, substructure) prior to transformation. In general, a transformation may be any mapping (function) of a collection to itself, or in other words, a mapping that transforms a particular collection into another collection.

Substructures from the same client may be transformed by anonymization module 108 using the same method or using different methods. If transformed using the same method, the transformed substructure or data of the substructure from the same client will have exactly the same appearance; otherwise, they will be different and will not build statistics (perform data collection) for the same client.

The server 104 may include a combining module 112, the combining module 112 configured to combine data structures partitioned on the client side. The combining module 112 may, for example, combine data based on unique identifiers that are assigned to the various substructures during partitioning and that are identical for the substructures of the same structure. The combining module 112 receives the sub-structures arriving at the server 104 over various network paths and combines them into one structure. The structure will obviously differ from the original structure divided at the client side, since the substructure passing through the nodes with anonymization module 108 will be transformed by the anonymization module 108. The resulting structure may be stored in a database (not shown in the figures).

In a specific example, the anonymization module 108 obtains from the client a structure (e.g., a structure of a request to the server) that is not divided into substructures by the modification module 110 of the client, in which case, for transmission to the server, the anonymization module 108 identifies the substructure containing the PD in the obtained structure and performs transformation of the data of the substructure; examples are given below.

The described system 100 is used for anonymization of requests dispatched to the server 104 and responses to those requests dispatched to the client 102, and is also used to obtain data from the client 102 for building statistics.

Fig. 2 is a block diagram illustrating exemplary operations according to a method of routing data in a client-server architecture, in a specific example for obtaining data from a client for building statistics. In step 200, the modification module 110 (e.g., executing on the client 102) partitions the fabric 201 for dispatch to the server according to criteria, one such criterion may be the presence of a PD in the fabric, and as a result of the partitioning, obtains a sub-fabric containing a PD (e.g., in fig. 2, this is sub-fabric 1) and a sub-fabric not containing a PD (in fig. 2, this corresponds to sub-fabric 2). Here and in the following, as an example of a standard, we will use the presence of a PD instead of the presence of critical or confidential data, although the standard valid for a PD in the exemplary aspects of the invention in the context of the present application is also typically valid for critical or confidential data. In a specific example, there may be more than one sub-structure of the first type and the second type, and more than one criterion with which the partitioning is performed.

In step 210, the modification module 110 dispatches (i.e., transmits) the obtained sub-structure to the server 104, the dispatch occurring through various paths (path a and path B), wherein one of the paths (e.g., path a) includes the network node 106 with the anonymization module 108. In one aspect, modification module 110 may determine at least two paths for dispatching at least two data substructures based on personal data contained in one of the data substructures. Network node 106 is located in a regional network different from the network in which server 104 is located and not in the same intranet as server or client 102. When one of the substructures for dispatch to the server contains a PD, the substructures are directed to the server by way of the node with anonymization module 108 (path a).

Then, in step 220, the substructure passing through the node 106 with the anonymization module 108 is transformed by the anonymization module 108 and then sent to the server 104 in the transformed condition (step 221). In general, the sub-structures from the same client are transformed differently at different times. For example, a sub-structure with a client identifier sent at a first time period is transformed to include an anonymous identifier (anonymous ID1) that is different from a subsequent anonymous identifier (anonymous ID2) from a sub-structure sent at a second time, even if the sub-structure is from the same client and has the same client identifier (i.e., client ID- > anonymous ID1 ≠ anonymous ID2 ≠ anonymous ID3, and so on), and this can apply to all examples. In particular cases, when information needs to be aggregated (statistics built) on a particular client for a particular security system, the transformation will be identical for sub-structures from the same client (e.g., client ID- > anonymous ID 1-anonymous ID 2-anonymous ID3, and so on).

Finally, in step 230, the sub-structures obtained from the clients are combined into one structure 231 (structure'). Obviously, the formed structure (structure') is different from the original structure because at least one sub-structure has been transformed by the anonymization module 108. The resulting structure 231 will also be used in the database by the server-side infrastructure. The infrastructure and databases are omitted from the figure for clarity of illustration. Various infrastructure elements, such as request processor 302 and attack detection module 602, are indicated in the other figures. The transformation of the data of the substructure and/or substructure by the anonymization module 108 is done in this way in order to exclude the possibility of an inverse transformation of the data of the substructure and/or substructure by any means other than the means of the network node 106 having the anonymization module 108.

Fig. 3 illustrates a routing method for performing a client request 301 with respect to a server in a specific example. In step 300, a request generated on the client side is dispatched from the client 102 to the server 104 by the modification module 110, the path including a network node 106 with an anonymization module 108, the node 106 being located in a regional network different from the network in which the server is located and not in the same intranet as the server or the client. In a specific example, some of the requested data (not containing confidential data) may be transformed by the modification module 110 on the client side, and the transformation may be done so that the anonymization module 108 cannot perform the inverse transformation (step 311 in FIG. 4) and only the server 104 may perform the inverse transformation (step 325 in FIG. 4). For example, the transformation and inverse transformation may be performed using asymmetric encryption techniques, where the client 102 has a public key and the server 104 has a private key. As used herein, unless otherwise indicated, the term "transformation" refers to forward transformation.

Next, in step 310, anonymization module 108 identifies substructures in the data structure for the request dispatched to the server according to criteria, one such criterion may be the presence of the PD, and obtains, as a result of the identification, a substructure that contains the PD (in fig. 3, this is substructure 1 by analogy with the previous example) and a substructure that does not contain the PD (in fig. 3, this is substructure 2). In step 320, a transformation (from original to transformed positive transformation) of the data substructure containing the PD (and/or data in the substructure) is performed using the anonymization module 108, and the requested formed data structure with the transformed substructure containing the PD is dispatched to the server using the anonymization module 108 (step 321).

In response to the received request, the server generates a response 323 using the request handler 302 in step 330. With respect to requested data that may have been transformed by the client 102 in a particular instance, the server 104 first performs an inverse transform (step 325 in FIG. 4, described below). The data structure 323 for the response to the request will contain the following sub-structures in the example using a PD: (1) contains at least one substructure of the PD transformed by the anonymization module 108 (substructure 1' extracted from the request structure); and (2) at least one sub-structure that does not contain a PD (sub-structure 3, which contains the body of the response to the request or the payload of the response).

Data that does not contain a PD (substructure 3) may be transformed (forward transformed) but not inverse transformed (substructure 3') by anonymization module 108, which is done in step 340. The inverse transformation of the data (e.g., asymmetric encryption, where the server has a public key and the client has a private key) may be performed only by the modification module 110 of the client, and the data structure 324 of the formed response to the request is dispatched from the server to the network node with the anonymization module 108 in step 350. The inverse transformation of the PD-containing data substructure 324 (substructure 1') of the response to the request is performed in step 360 using anonymization module 108. An inverse transformation (an inverse transformation from the transformed data to the original data originally contained in the request from the client) is performed with respect to the data transformed in step 320. The obtained data structure is redirected to the client (step 370) and the inverse transformation of the PD-free data substructure of the response to the request transformed by the server in step 340 is performed in step 380 using the modification module 110 of the client. Thus, the client 102 generates a data structure 381, which data structure 381 contains the data substructures of the response to the request, transformed by the server, that do not contain PDs.

Fig. 4 shows a variant of the method shown in fig. 3, but in this variant step 310, the identification of the substructure is not performed by the anonymization module 108 but by the modification module 110 of the client, followed by the transformation of the substructure in step 311. By analogy with the variant in fig. 3, the substructure that does not contain PDs (substructure 2) is subjected to a transformation. Thus, step 300' in fig. 4 differs from similar step 300 of the method in fig. 3 in that, instead of the original data structure of the request, the transformed structure 412 after performing

steps

310 and 311 is sent to the node with anonymization module 108. Thus, in this variant, a step 325 is added before the step 330 is performed, in which step 325 the inverse transformation of the sub-structure transformed in step 311 (in this example, sub-structure 2' not containing a PD) is first performed.

Fig. 5 shows a variant of the method of data routing in a client-server architecture, in which steps 200 to 230 are similar to the steps of the method shown in fig. 2, and steps 300 to 380 are similar to the steps of the method shown in fig. 3. In a specific case, by analogy with step 311 in fig. 4, the transformation can be performed first before the sub-structure 2 is directly dispatched to the server, so that in addition to step 311, a step 325 is added in the diagram of the method.

In a specific example, in all aspects of the methods shown in fig. 3-5, the data structure assigned to the client 102 by the node 106 with the anonymization module 108 in step 370 does not contain a data substructure with a PD (substructure 1 in these examples). This substructure needs to be saved up to this step 370 in order to determine the recipient of the response; this need not be the case in this particular example thereafter.

FIG. 6A illustrates exemplary operations of the method illustrated in FIG. 2. The client 102 is communicatively connected to a system for remote detection of targeted attacks, such as attack detection module 602, located on the server side. To allow for the full operation of the attack detection module 602, it may be necessary to obtain information from the client 102 about files with malicious code (malicious files) detected at different times and to construct statistical information based on the obtained information (typically, this still needs to be done anonymously in accordance with national legislation on personal data). Upon detection of a plurality of such malicious files based on information received from the client, the following conclusions are drawn on the server side: a targeted attack has been detected on the client.

To transmit information about a detected malicious file to the server, client 102 generates data structure 601, which data structure 601 includes a client identifier ("client ID") and information about the detected malicious file ("MD 5"). In step 200, the modification module 110 divides the generated structure 601 for assignment to the server into substructures, and obtains a substructure containing the client ID and a substructure containing the MD5 of the file as a result of the division. In order to know to which structure these substructures belong, identifiers (in the figure, the identifiers are denoted as structure IDs) are assigned to these substructures. In step 210, the modification module 110 of the client transmits the obtained substructure to the server 104, the transmission taking place over different paths (path a and path B), wherein one of the paths (path a) comprises a network node 106 with an anonymization module 108, said node 106 being located in a regional network different from the network in which the server is located and not in the same intranet as the server or the client. The substructure containing the client ID is directed to the server 104 by the node 106 with anonymization module 108 (Path A). In step 220, the anonymization module 108 performs a transformation of the client ID, where the client ID is saved at the node and replaced in the sub-structure with a token-anonymous ID (in a specific example, the client ID may be encrypted). The obtained sub-structure is dispatched to the server (step 221). Finally, in step 230, the sub-structures received from the clients are combined into a structure 603. Obviously, the formed structure 603 is different from the original structure 601 because at least one sub-structure has been transformed by the anonymization module 108. The formed structure 603 is saved at the server 104 (or in any given database of the infrastructure to which the server belongs) and will be used by the server to aggregate information (represented in the figure as statistical information (statics)) about the clients 102 from which the structure was obtained. In step 240, the aggregated information is to be used by the attack detection module 602, and if the attack detection module 602 detects an attack, in step 250, the attack detection module 602 generates a data structure 623 containing a substructure with anonymous ID and a substructure containing information about the attack (represented in the figure as attack ID); the obtained structure 623 will be addressed to the client to give notice of the attack.

An example of an assignment method is shown in fig. 6B, with steps 340 through 380 being similar to those of the example shown in fig. 8. In a particular example, information about an attack may not be transformed, but rather dispatched in an open form; in this case, the example would lack

steps

340 and 380. In FIG. 6B and in aspects shown in other figures of the present invention, optional and alternative aspects are depicted in dashed outline or in light italic font, such as the client ID area in the obtained response in step 370.

FIG. 7 illustrates another exemplary operation of the present invention. The client device 102 has detected a new file that needs to be scanned by the server 104 to determine if malicious code is present. To this end, information about the file needs to be dispatched to the server, which in this example is the MD5 of the file, for which the client generates the request data structure 701. For this purpose, to inform the server to whom the response should be dispatched, modification module 110 (e.g., executing at client 102) inserts the client ID into request data structure 701, such that request data structure 701 includes the client ID and MD5 for the file. In step 300, the generated request is dispatched by the modification module 110 to the server, the path including the network node 106 with the anonymization module 108, which is located in a regional network different from the network in which the server is located and not in the same intranet as the server or the client. Next, in step 310, anonymization module 108 identifies the substructure in structure 701 for assignment to a server, obtaining as a result of the identification a substructure containing the client ID and a substructure containing MD5 of the file. In step 320, the anonymization module 108 performs a transformation of the client ID, where the client ID is saved at the node 106 and replaced in the substructure with a token-anonymous ID (in a specific example, the client ID may be encrypted). The data structure with the transformed sub-structure of the obtained request is dispatched to the server (step 321). A response 723 to the received request is generated by the request handler 302 of the server 104 in step 330. The request handler 302 extracts the MD5 of the file from the structure and issues a sanction indicating that the file being analyzed at the client is malicious (e.g., "MD 5-BAD"). The data structure 723 for a response to a request contains the following sub-structures: (1) at least one sub-structure containing a token anonymization ID (or client ID encrypted by anonymization module 108); and (2) at least one substructure containing a sanction for the file (MD 5-BAD).

In this regard, the resolution is transformed by the server 104 in step 340 and cannot be reverse transformed by the anonymization module 108 (e.g., the private key is escrowed at the client by encrypting the resolution with a public key (the transformed resolution is denoted as encryption resolution (EncryptedVer) in the figure), and the reverse transformation can only be performed by the modification module 110 of the client. In step 350, the data structure 724 of the obtained response to the request is dispatched from the server to the network node 106 with the anonymization module 108. In step 360, anonymization module 108 performs the inverse transformation of the data sub-structure 724 containing the token anonymity ID of the response to the request, where in the case of the token is replaced by the previously saved client ID, and in the case of the encrypted client ID, the client ID is decrypted. Thus, the transformation is performed on the data transformed in step 320. The obtained data structure is redirected to the client (step 370), and in step 380 the modification module 110 of the client performs the inverse of the sanction transformed by the server in step 340; in this example, the sanction is encrypted by means of a private key. In a specific example, the anonymous ID is for the same client ID, but the two will be different in different transmissions.

Fig. 8 shows a modification of the example shown in fig. 7. In this variant, step 310 following the identification of the substructure is not performed by the anonymization module 108, but by the modification module 110 of the client 102, wherein the subsequent transformation of the substructure saves the information about the file (MD5 of the file) by encryption with the public key (in the figure, the transformed information about the file is represented as encrypted MD 5); the private key is kept at the server and the inverse transformation can only be performed at the server. Thus, step 300' of the example in fig. 8 differs from similar steps of the example in fig. 7 in that, instead of the original structure of the request (e.g., 801), the transformed structure (data structure 812) after performing step 310 and step 311 is sent to the node with anonymization module 108. Accordingly, step 325 is added, wherein the encrypted information about the file is inversely transformed by decrypting it by means of the private key before step 330 is performed.

Fig. 9 shows an example of data routing in a client-server architecture, where steps 200 to 230 are similar to those of the example shown in fig. 6A, and steps 330 to 380 are similar to those of the example shown in fig. 7. In a specific example, by analogy with step 311 in the example of fig. 8, the information about the file may be transformed first before being directly dispatched to the server, so step 325 is added in this example in addition to step 311.

The modification module 110 of the client intercepts the structures for dispatch to the server 901, partitions the structures according to established rules, and also selects paths for the substructures according to the rules. In a particular example, the rules under which the modification module 110 operates are established according to one or more information technology policies configured to comply with existing regulations and laws within the jurisdiction in which the client device 102 (source) is operating. Thus, to apply the rules, the client's modification module 110 determines the location of the device (source), the type of data in the formed data structure 901, the purpose of the data structure (e.g., type of transmission: request or statistics, where data is dispatched to the server to compile statistics on the server side), the location of the data recipient. Based on this, the modification module 110 selects a path for the data, a partition variant, and a transformation method on the client side according to the rules. One variation of the formalized rules is presented in Table 1, seen in FIG. 15, where the "methods" column indicates the relevant methods for the transformation, which may include the following: "method 1" is characterized as comprising partitioning the data structure at the client side (see FIG. 2); "method 2" is characterized as including identifying a data structure at a node having an anonymization module 108 (see FIG. 3); "method 3" is characterized as including identifying a data structure at the client side (see FIG. 4).

As noted above, these rules may be dictated by the requirements of the laws/laws (such as GDPR) and include assumptions and tendencies as any given legal specification, so there is also a corresponding "if-then" structure in the algorithmic language. Thus, table 1 is provided to formalize the rules in the following format:

IF [ type, source, receiver, personal data (yes/no) ], THEN [ method, location of anonymization node, method of transformation for the data ] (IF [ type, source, receiver, personal data (yes/no) ], [ method, location of anonymization node, transformation method for data ]).

List 1: example rule formats

Consider an example data structure in which modification module 110 determines: the type of transmission is request, the source (client) is germany, the receiver (server) is russian federation, and the structure contains personal data. According to the rules, the modification module 110 identifies on the client side the substructure with the PD (as in step 310 of fig. 4-method 2) and assigns the substructure via the USA, encrypts the substructure without the PD with the public key (as in step 311 of fig. 4), and transforms the personally identifiable information by the anonymization module 108 using encryption.

Fig. 10 shows a variant system 1000 of anonymous data exchange in a client-server architecture, similar to the system shown in fig. 1, except that the system 1000 comprises a network node 1002 with a storage module 1004. The storage module 1004 may include one or more storage devices. The network node 1002 with the storage module 1004 is located in a regional network 107 different from the regional network in which the server is located and not in the same intranet as the server or client. In a particular example, the network node 1002 with the storage module 1004 may be in the same regional network (such as the network indicated as "regional network N" in fig. 10) as the network node 106 with the anonymization module 108. The purpose of the network node 1002 with the storage module 1004 is to hide the external IP address of the client 102 from the server 104 and to relieve the node 106 at which the anonymization module 108 is located from the burden, thereby reducing the amount of traffic passing through the node 106 with the anonymization module 108. The network node 1002 with storage module 1004 is an intermediate repository for data exchanged by clients and servers.

The system 1000 shown in fig. 10 is used for anonymous exchange of data between clients and servers, including for transmission of data from clients for building statistics, and for "request-response" type client-server interactions. Fig. 11 illustrates a method of anonymous exchange of data between a client and a server, in a specific example for obtaining data from the client for building statistics on the server side.

Steps

200, 221, 220, 230 are similar to those shown in fig. 2. Step 210' is different from the similar step and adds step 222. In FIG. 2, path B goes directly from the client to the server, but in the aspect depicted in FIG. 11, the path is broken up and the client does not dispatch sub-structure 2 to the server, but to a node with storage module 1004. The substructure will then be received by the server in step 222. The initiator of the transmission of the substructure to the server in step 222 may be the node 1002 with the storage module 1004, or the server 104, which upon receiving the substructure 1' via path a downloads the substructure 2 with the identifier of the substructure 2 on demand, the identifier of the substructure 2 being saved by the network node 1002 with the storage module 1004.

Fig. 12 illustrates a method for performing a data exchange of a client request to a server in a specific example.

Steps

200, 221, 220, 230 are similar to those shown in fig. 2, steps 210', 222 are similar to those shown in fig. 11, and step 330 is similar to that same step in fig. 3. Thus, dispatching requests to servers is analogous to dispatching data to servers to build statistics, as shown in FIG. 11; differences from all of the above description include: how to dispatch the response prepared in step 330. The structure of the response to the request generated in step 330 is decomposed in step 331 into at least two sub-structures: (1) at least one substructure containing a PD transformed by anonymization module 108 (e.g., substructure 1' extracted from request structure); and (2) at least one sub-structure that does not contain a PD (sub-structure 3, which contains the body of the response to the request or the payload of the response).

In step 350a, the sub-structure containing the PD is dispatched from the server 104 to the node 106 with anonymization module 108, wherein in step 360 a transformation is to be performed that is the inverse of the transformation performed in step 220. The sub-structure that does not contain a PD (in fig. 12, sub-structure 3) is assigned to the network node 1002 with the storage module 1004 in step 350 b. Next, the substructure that does not contain a PD is sent to the client in step 371. Thereby, the variations of the sub-structure received by the client in step 371 may be different. If step 350a is performed, after the transformation in step 360, the node with anonymization module 108 dispatches a notification (message) of response readiness to the client in step 370 a; thereafter, the client accesses the node having the storage module 1004 and receives the sub-structure not containing the PD from the node having the storage module 1004. The notification in step 370a may for example comprise a unique identifier assigned to the sub-structure 3 in the course of the partitioning of the structure of the response to the request in step 331, the sub-structure with this identifier being requested by the client from the network node 1002 with the storage module 1004. In a specific example,

steps

350a, 360, 370a may not be performed. In this case, the identifier assigned to the sub-structure during the partitioning in step 200 will be similar to the identifier assigned in step 331, and in step 371, the client will obtain the sub-structure 3 by periodically polling the node with the storage module 1004 with respect to arrival of the sub-structure with the corresponding identifier at the node with the storage module 1004. If

steps

350a, 360, 370a are not performed, the structure of the response to the request is the same as the sub-structure without PD (sub-structure 3) assigned the unique identifier. In another embodiment, in step 371, the node with storage module 1004 independently dispatches substructure 3 to the client; in this case, a session identifier is used, which is established between the client and the node with the storage module 1004 to perform step 210; in the given case, the unique identifiers assigned to the sub-structures in step 200 and step 331 are identical and they are identical to the session identifier. In this case, when the node receives substructure 3 in step 350b, it will read the identifier of substructure 3 and forward it to the client whose session has the same identifier; the main condition for the execution of this variant is to keep the session between the client and the node with the storage module 1004 while executing the request and dispatching the response until the data exchange between the client and the server is over.

In a specific example, the scheme depicted in FIG. 12 may operate in an asynchronous mode; in this case, step 330 is performed without performing step 230, the data of sub-structure 2 is used, and step 331 is omitted, and the obtained sub-structure 3 is dispatched to the node having the storage module 1004 (step 350 b). Step 230 will be performed independently of step 330. Such a mode increases the response speed of the server and is used in the following cases: only the data contained in the sub-structure that does not contain a CD is needed to process the request. In this case only a combination of sub-structures (step 230) is required to build the statistical information, as in the example shown in fig. 12A.

Fig. 13 shows an example of the use of the method shown in fig. 12 in order to obtain an arbitration (dangerous/malicious or secure) from the server for files detected on the client side. To transmit information about the detected file (in this example, the information about the file is the MD5 of the file) to the server, a data structure is generated that includes the client ID and the MD5 of the detected file. In step 200, the modification module 110 divides the generated structure for transmission to the server into substructures, and obtains a substructure containing the client ID and a substructure containing the MD5 of the file as a result of the division; in order to know the structures to which these substructures belong, identifiers (in the figure, the identifiers are represented as structure IDs) are assigned to these substructures. In step 210, the modification module 110 of the client dispatches the obtained sub-structure. The assignment is made over different paths (path a and path B) to different recipients. The sub-structure is dispatched to the server via path a, which includes network nodes with anonymization module 108 located in a regional network different from the network in which the server is located and not in the same intranet as the server or client. The sub-structure containing the client ID is sent to the server by means of the node with anonymization module 108 (path a). The sub-structure is dispatched via path B to a network node 1002 having a storage module 1004, said node 1002 being located in a regional network different from the network in which the server is located and not in the same intranet as the server or client. The sub-structure of MD5 containing the file is sent to network node 1002 with storage module 1004 (path B). In step 220, a transformation of the client ID is performed using the anonymization module 108, where the client ID is saved at the node and replaced in the sub-structure with a token-anonymous ID (in a specific example, the client ID may be encrypted). The obtained sub-structure is dispatched to the server (step 221). In step 222, a sub-structure of MD5 with a file will be received by the server. If the method is performed in synchronous mode, the sub-structures obtained by the server in

steps

221 and 222 will be combined in step 230 and the response will be processed in step 330. In this example, MD5 will be scanned through a database of malicious and secure files, and the results of the scan will yield an arbitration and generate a response to the request (in the given example, the file is proven to be malicious — MD 5-BAD). The generated response to the request is divided into two substructures in step 331, a substructure containing the client ID and a substructure containing the arbitration (MD5) are obtained as a result of the division, and identifiers (represented as structure IDs in the figure) are assigned to these substructures in order to know the structures to which they belong; in a specific example, the identifier may be the same as the identifier assigned to the sub-structure in step 200. In step 350b, the sub-structure with the arbitration is assigned to the network node 1002 with the storage module 1004, and in step 371 the network node 1002 forwards the sub-structure to the client (if the structure ID corresponds to the session ID between the node and the client established in step 210) or saves the sub-structure until needed. This sub-structure may be needed by the client in case the notification of the client is received from a node having the received anonymization module 108 as a result of the execution of step 350a, step 360 and step 370 a. On the other hand, the client may constantly poll the network node 1002 with the storage module 1004 as to whether a response substructure exists at the node (in which case the structure IDs assigned to the substructures in

steps

200 and 331 should be identical). In step 372, the client processes the response. If the method is performed in asynchronous mode (fig. 13A), step 230 and step 330 are performed independently. The fabric ID in step 330 is unchanged and the same as in step 200, and in the specific example is identical to the session ID between the client and the node with the storage module 1004 of step 210, in which context the transmission of the sub-fabric will also take place in step 371.

Aspects of the present invention enable the dispersion of data from a client, which provides anonymity to a user whose device is the client; data exchanged by a client with a server cannot be associated with the client when accessing the server. Some data is server-only aware, some data is only aware of network nodes with anonymization module 108, and data cannot be de-anonymized without simultaneous access to these system components, while ensuring that components cannot be accessed simultaneously (including by government structures) by distributing system components over different regional networks, which differ both geographically and locally in jurisdiction. Aspects of the present invention also allow the external IP address of the client to be hidden from the server when utilizing a node with storage module 1004 (the server does not pick a substructure directly from the client, but rather by means of the node with storage module 1004), and also alleviate the burden on the node with anonymization module 108.

In some cases, after the data structure has been divided into two data substructures (one of which contains confidential data), it is desirable to further divide a given substructure. In one embodiment, this is done when the data is critical only when together, e.g., the IP address and timestamp are personal data together; it has been found that such associated substructures are divided into substructures with IP addresses and substructures with time stamps, which data loses its personal properties and can be processed by nodes that do not have the ability to combine these structures, while no legal constraints are imposed on the processing of critical data (personal data in a given case). But in this case the mechanism for sending data to the server is more complex.

FIG. 14 illustrates a method of transmitting critical data in a client-server architecture, in a specific example for obtaining data from a client for building statistics. It should be understood that certain individual infrastructure elements (e.g., request processors, attack detection modules, databases) indicated in other figures have been omitted from fig. 14 for clarity only.

In step 200, the modification module 110 (e.g., executing on the client 102) partitions the structure for transmission to the server 104 according to criteria, one such criterion may be the presence of critical data (e.g., confidential data, personal data) in the structure. As a result of the partitioning, a first data substructure is obtained which contains critical data (e.g. in fig. 14 this is substructure 1) and a second data substructure which does not contain such data (in fig. 14 this is correspondingly substructure 2). In step 201, the modification module 110 additionally divides the sub-structure containing the critical data into at least two sub-structures (e.g., in fig. 14, this is sub-structure 3 and sub-structure 4). In step 210, the modification module 110 sends the sub-structure 2 to the server via path B. In step 211, the sub-structure obtained during the division of the sub-structure containing critical data is continuously sent through another path than path B, wherein an alternative path (in the example of fig. 14, this is path a) comprises a network node with a transformation module and which, in the specific example, is located in a regional network different from the network in which the server is located and not in the same intranet as the server or the client.

Next, in step 220, the transformation is performed by the sub-structure of the node 106 having the transformation module with the module, and is transmitted forward to the server in the transformed state (step 223). In general, the sub-structures from the same client may be transformed differently at different times (e.g., client ID- > anonymous ID1 ≠ anonymous ID2 ≠ anonymous ID3, and so on). This applies to all examples, but in a specific instance, when information about a particular client needs to be collected for a sub-structure from that same client for a particular security system (building statistics), the transformation will be identical (e.g., client ID- > anonymous ID 1-anonymous ID 2-anonymous ID3, and so on). Finally, in step 230, the sub-structures obtained from the clients are combined into one data structure (structure'). The final data structure (structure') is clearly different from the original data structure because at least two sub-structures have been transformed by the anonymization module 108. The final structure in the database will also be used by the server side infrastructure module to build, for example, a configuration file. The transformation of the data of the substructure and/or the substructure by the transformation module is performed by a method which excludes the possibility of inverse transformation of the data of the substructure and/or the substructure by any module except the module of the network node having the transformation module.

FIG. 14A illustrates an example of an implementation of a method of transmitting critical data. On the client side, a structure is generated for sending to the server, containing the client's IP address, TimeStamp (TimeStamp) and MD5 for the particular file. In step 200, the modification module 110 divides the structure for sending to the server, obtaining as a result of the division: a sub-structure containing IP addresses and timestamps, and a sub-structure containing MD5 of the file. In step 201, the modification module 110 also divides the substructure containing the IP address and the timestamp into two substructures (in fig. 14, this is the substructure with the IP address and the substructure with the timestamp). To know which sub-structure containing MD5 is associated with the IP sub-structure and the timestamp sub-structure, the IP sub-structure and the timestamp sub-structure are assigned identifiers (in the figure, the identifiers are denoted structure ID1, structure ID2) and these same identifiers are placed in the MD5 sub-structure. In step 210, the modification module sends the substructure with MD5 to the server over path B, and in step 211, the modification module sends the substructure with IP address and the substructure with time stamp consecutively over another path than path B, wherein the alternative path (in the example of fig. 14A, this is path a) comprises a network node 106 with a transformation module 108, wherein this node with the transformation module is located in a specific instance in a regional network different from the network where the server is located and not in the same intranet as the server or the client. Then, in step 220, the substructure with the IP address and the substructure with the timestamp are transformed and sent forward to the server in transformed form (step 223). The transformation is performed upon receiving the sub-structure. Finally, in step 230, the multiple substructures received from the client are combined into one structure comprising the transformed IP address, the transformed timestamp, and MD 5.

FIG. 1 illustrates a system for data routing in a client-server architecture. Fig. 1A shows the same system, except that there is an additional network node 114 with an authentication module 116 in the system. In this system, and in the system of FIG. 10, there may be a storage module. The authentication module 116 may be used to generate an encryption key for one transformation of confidential data on multiple clients 102. In one aspect, if an asymmetric encryption scheme is used for the transformation (described below), a separate trusted authentication module 116 is used to create a key pair for each client 102, the key pair including a public key and a private key. In an aspect, the authentication module 116 is configured to communicate the public key to the client 102 and to escrow the private key. The disclosed anonymization method utilizes encryption of an identifier sent by the user's device (client) 102 using a public key generated by the trusted authentication module 116. This process ensures that no one can find the true identifier of the user of the client 102 on the node with the anonymization module 108 and the server 104. The true user identifier cannot be accessed without the private key portion of the key, which is kept by the authentication module 116 and is not disclosed to anyone. In an aspect, the network node 114 carrying the authentication module 116 may be located on a different intranet than the client 102, the server 104, and the anonymization module 108. In certain instances, the node 114 is located in a different regional network than the regional network of the server 104, and/or the node 106 with the anonymization module 108, and/or the regional network of the client 102. In yet another aspect, the network node 114 and/or its authentication module 116 may be operated by an independent and trusted certification authority that is not part of, and under the control of, the operator of the anonymization module 108 and/or the server 104.

Fig. 16 illustrates an exemplary method for communicating critical data in a client-server architecture using authentication module 116. In step 410 (not shown), an encryption key is generated by authentication module 116. Next, in step 420, the authentication module 116 sends the encrypted public key to the client 102. Further, the client 102 performs initial data encryption using the received key. For example, the client may encrypt confidential data sent from the client 102, such as, but not limited to, an identifier, an IP address, an email address, a link to a social network profile, a timestamp, a phone number, and so forth. In step 200, the modification module 110 of the client separates the structures intended to be sent to the server 104 according to predetermined criteria. One such criterion may be the presence of critical data in the structure. A particular case of critical data is confidential data (sensitive data) or personal data. As a result of this modification, the original structure may be divided into a sub-structure containing critical data (e.g., sub-structure 1 in FIG. 16) and a sub-structure containing no such data (sub-structure 2 in FIG. 16). In optional step 201, the modification module 110 may further divide the sub-structure containing the critical data into at least two sub-structures containing different types of critical data (e.g., in fig. 16, these are sub-structure 3 and sub-structure 4). In step 202, the modification module 110 encrypts the sub-structure 3 and the sub-structure 4 using the received public key to obtain the sub-structure 3 'and the sub-structure 4'.

In step 210, modification module 110 sends substructure 2 to server 104 via path B. In step 211, the sub-structures 3 'and 4' obtained by splitting the sub-structure containing the critical data using one-time transformation and encryption are sequentially transmitted along an alternative path different from path B. In an aspect, the alternative path includes network node 106 (path a in the example of fig. 16) having anonymization module 108. In an aspect, the alternative path is located in a different area network than the network carrying the server 104, and/or is not in the same intranet as the server 104 or the client 102. Next, in step 220, the substructure that passed through the node 106 with anonymization module 108 is again transformed by anonymization module 108 (in the example of FIG. 16, into substructure 3 "and substructure 4"). Anonymization module 108 further transmits the received substructure in transformed form to server 104 (step 223).

In general, substructures from the same client 102 may be transformed differently at different points in time. For example, a sub-structure with a client identifier sent at a first time period is transformed to include an anonymous identifier (anonymous ID1) that is different from a subsequent anonymous identifier (anonymous ID2) from a sub-structure sent at a second time, even if the sub-structure is from the same client 102 and has the same client identifier (i.e., client ID' - > anonymous ID1 ≠ anonymous ID2 ≠ anonymous ID3, etc.), and this may apply to all examples. In certain cases, when information needs to be aggregated (statistical information built) on a particular client for a particular security system, the transformation will be identical for the substructures from the same client 102 (e.g., client ID' - > anonymous ID 1-anonymous ID 2-anonymous ID3, and so on). Finally, in step 230, the sub-structures received from the clients 102 are combined into one structure (structure') by the server 104.

The final data structure (structure') is clearly different from the original data structure because at least two sub-structures have been transformed by anonymization module 108. The final structure in the database will also be used by the infrastructure module on the server side, e.g. to build a configuration file. The transformation of the data of the sub-structure and/or sub-structure by anonymization module 108 may be performed by a method that excludes the possibility of inverse transforming the data of the sub-structure and/or sub-structure by any module other than: a network node 106 having an anonymization module 108. Transformation of the data of the sub-structure and/or sub-structure by the anonymization module 108 using the encryption key from the authentication module 116 may be performed by a method that excludes the possibility of inverse transformation of the data of the sub-structure and/or sub-structure by any module other than the module network node 106 having the anonymization module 108 (or in one aspect, by the client 102). On the other hand, the inverse transformation cannot be performed by any means.

FIG. 16A illustrates another example of a method for transmitting critical data in a client-server architecture using an authentication module. In step 410, authentication module 116 generates a plurality of public and private key pairs. In step 411, the authentication module 116 selects a public and private key pair for the particular client 102 from the generated set of keys. In step 420, the authentication module 116 sends the public key to the client 102 and stores the private key. The client 102 generates a structure intended to be sent to the server 104. The generated structure may contain the client's IP address, a timestamp, and MD5 of the detected malicious file. In step 200, the modification module 110 divides the structure intended to be sent to the server 104, obtaining as a result of the division: a sub-structure containing IP addresses and timestamps, and a sub-structure containing MD5 of the file. In step 201, the modification module 110 further divides the substructure containing the IP address and the timestamp into two substructures (in fig. 16A, this is the substructure with the IP address and the substructure with the timestamp). In order to know the structure to which the sub-structure containing the IP address and the sub-structure containing the time stamp belong, they are assigned identifiers (in fig. 16A, the identifiers are represented as structure IDs).

In step 202, the modification module 110 encrypts the data of the substructure containing the IP address and the data of the substructure with the time stamp using the public key obtained from the authentication module 116. In step 210, modification module 110 sends the sub-structure with MD5 to server 104 via path B. In step 211, the modification module 110 continuously sends the substructure with the IP address and the substructure with the timestamp through an alternative path different from path B, where the alternative path includes the network node 106 with the anonymization module 108 (in the example of fig. 16A, this is a path a), where the node 106 with the anonymization module 108 is located in a different area network from the network where the server 104 is located in the particular instance, and is not in the same intranet as the server 104 or the client 102. Then, in step 220, the substructure with the encrypted IP address and the substructure with the encrypted timestamp are transformed (e.g., anonymized) and sent forward to the server 104 in transformed form (step 223). The transformation is performed when the sub-structure is received. Finally, in step 230, the sub-structures received from client 102 are combined into a structure containing the transformed encrypted IP address, the transformed encrypted timestamp, and MD 5.

The authentication module, the modification module, the anonymization module, the combination module, the request processor, the attack detection module, and the storage module refer in the present invention to real devices, systems, components, component groups, which are implemented with hardware, such as an Integrated-microcircuit (Application-Specific Integrated Circuit, ASIC) or a Field-Programmable Gate Array (FPGA), or for example in the form of a combination of software and hardware, such as a microprocessor system and a set of program instructions, and also based on a neuromorphic chip (neurosynaptic chip). The functions of the components may be implemented by hardware only, but also in a combination in which some functions are implemented by software and some functions are implemented by hardware. In certain variant aspects, the modules may be executed on a processor of a computer (such as the computer shown in fig. 17). The database may be implemented by each of the possible methods and may be contained on a single physical medium or on different physical media, both local and remote.

FIG. 17 is a block diagram illustrating a computer system 20 upon which aspects of the systems and methods for anonymously collecting malware-related data from client devices may be implemented. It should be noted that computer system 20 may correspond to, for example, client 102, server 104, network node 106, and network node 1002 as previously described. The computer system 20 may be in the form of multiple computing devices, or a single computing device, such as: desktop computers, notebook computers, handheld computers, mobile computing devices, smart phones, tablet computers, servers, mainframes, embedded devices, and other forms of computing devices.

As shown, computer system 20 includes a Central Processing Unit (CPU) 21, a system memory 22, and a system bus 23 that couples various system components including memory associated with CPU 21. The system bus 23 may include a bus memory or bus memory controller, a peripheral bus, and a local bus capable of interacting with any other bus architecture. Examples of the bus may include PCI, ISA, serial bus (PCI-Express), HyperTransport^TM(HyperTransport^TM) Infiniband^TM(InfiniBand^TM) Serial ATA, I²C. And other suitable interconnects. Central processing unit 21 (also referred to as a processor) may include a single set or multiple sets of processors having a single core or multiple cores. The processor 21 may execute one or more computer executable codes that implement the techniques of the present invention. The system memory 22 may be any memory for storing data used herein and/or computer programs executable by the processor 21. The system Memory 22 may include volatile Memory (such as Random Access Memory (RAM) 25) and non-volatile Memory (such as Read-Only Memory (ROM) 24, flash Memory, etc.) or any combination thereof. A Basic Input/Output System (BIOS) 26 may store Basic programs used to transfer information between elements within the computer System 20, such as those used when the operating System is loaded using ROM 24.

The computer system 20 may include one or more storage devices, such as one or more removable storage devices 27, one or more non-removable storage devices 28, or a combination thereof. One or more removable storage devices 27 and one or more non-removable storage devices 28 are connected to the system bus 23 by a memory interface 32. In one aspect, the storage devices and corresponding computer-readable storage media are power-independent modules that store computer instructions, data structures, program modules, and other data for the computer system 20. A wide variety of computer-readable storage media may be used for system memory 22, removable storage devices 27, and non-removable storage devices 28. Examples of computer-readable storage media include: machine Memory such as cache, Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), zero-capacitance RAM, two-transistor RAM, enhanced Dynamic Random Access Memory (eDRAM), eDRAM, Extended Data Output Random Access Memory (EDO RAM), Double Data Rate Random Access Memory (DDR), Electrically Erasable Programmable Read Only Memory (Electrically Erasable Programmable Read-Only Memory (EEPROM)), Nanotube Random Access Memory (nano-structure Random Access Memory (NRAM), Resistive Random Access Memory (SRAM), Silicon Oxide-Silicon (Oxide-Nitride-Oxide-Silicon (Oxide-Nitride-Oxide) Memory (SRAM), and Silicon Oxide-Silicon (Oxide-Nitride-Oxide) Memory (Oxide-Nitride-Oxide) Memory (eDRAM), Phase-change Random Access Memory (PRAM); flash memory or other storage technologies, such as in Solid State Drives (SSDs) or flash drives; magnetic tape cartridges, magnetic tape, and magnetic disk storage, such as in a hard disk drive or floppy disk drive; optical storage, such as in a compact disc (CD-ROM) or Digital Versatile Disc (DVD); and any other medium which can be used to store the desired data and which can be accessed by computer system 20.

System memory 22, removable storage devices 27 and non-removable storage devices 28 of computer system 20 may be used to store an operating system 35, additional application programs 37, other program modules 38 and program data 39. The computer system 20 may include a peripheral interface 46 for communicating data from an input device 40, such as a keyboard, mouse, stylus, game controller, voice input device, touch input device, or other peripheral device, such as a printer or scanner via one or more I/O ports, such as a Serial port, parallel port, Universal Serial Bus (USB), or other peripheral interface. A display device 47, such as one or more monitors, projectors or integrated displays, may also be connected to the system bus 23 via an output interface 48, such as a video adapter. In addition to the display device 47, the computer system 20 may be equipped with other peripheral output devices (not shown), such as speakers and other audiovisual devices.

The computer system 20 may operate in a networked environment using network connections to one or more remote computers 49. The one or more remote computers 49 may be local computer workstations or servers including most or all of the elements described above in describing the nature of the computer system 20. Other devices may also exist in a computer network such as, but not limited to, routers, web sites, peer devices, or other network nodes. The computer system 20 may include one or more Network interfaces 51 or Network adapters for communicating with remote computers 49 via one or more networks, such as a Local-Area computer Network (LAN) 50, a Wide-Area computer Network (WAN), an intranet, and the internet. Examples of the network interface 51 may include an ethernet interface, a frame relay interface, a SONET interface, and a wireless interface.

Aspects of the present invention may be systems, methods and/or computer program products. The computer program product may include one or more computer-readable storage media having computer-readable program instructions thereon for causing a processor to perform various aspects of the present invention.

A computer-readable storage medium may be a tangible device that can hold and store program code in the form of instructions or data structures that can be accessed by a processor of a computing device, such as computing system 20. The computer readable storage medium may be an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination thereof. By way of example, such computer-readable storage media may comprise Random Access Memory (RAM), Read Only Memory (ROM), EEPROM, portable compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD), flash memory, a hard disk, a laptop disk, a memory stick, a floppy disk, or even a mechanically encoded device such as a punch card or raised structure in a groove having instructions recorded thereon. As used herein, a computer-readable storage medium should not be taken to be a transitory signal per se, such as a radio wave or other freely propagating electromagnetic wave, an electromagnetic wave propagating through a waveguide or transmission medium, or an electrical signal transmitted through a wire.

The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a variety of computing devices, or to an external computer or external storage device via a network (e.g., the internet, a local area network, a wide area network, and/or a wireless network). The network may include copper transmission cables, optical transmission fibers, wireless transmissions, routers, firewalls, switches, gateway computers and/or edge servers. A network interface in each computing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium within the respective computing device.

The computer-readable program instructions for carrying out operations of the present invention may be assembly instructions, Instruction-Set-Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source code or object code written in any combination of one or more programming languages, including an object-oriented programming language and a conventional procedural programming language. The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server (as a stand-alone software package). In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a LAN or a WAN, or the connection may be made to an external computer (for example, through the Internet). In some embodiments, an electronic circuit, including, for example, a Programmable Logic circuit, a Field Programmable Gate Array (FPGA), or a Programmable gate Array (PLA), can perform aspects of the invention by utilizing state information of computer-readable program instructions to execute the computer-readable program instructions to personalize the electronic circuit.

In various aspects, the systems and methods described in this disclosure may be processed in modules. The term "module" as used herein refers to, for example, a real-world device, component, or arrangement of components implemented using hardware, for example, through an Application Specific Integrated Circuit (ASIC) or Field Programmable Gate Array (FPGA), or a combination of hardware and software, for example, a combination of a microprocessor system and a set of instructions implementing the functionality of the module, which when executed convert the microprocessor system into a special-purpose device. A module may also be implemented as a combination of two modules, with certain functions being facilitated by hardware alone and other functions being facilitated by a combination of hardware and software. In some implementations, at least a portion of the modules (and in some cases all of the modules) may run on a processor of a computer system (e.g., the computer system described in more detail above in fig. 17). Thus, each module may be implemented in various suitable configurations and should not be limited to any particular implementation illustrated herein.

In the interest of clarity, not all of the routine features of the various aspects are disclosed herein. It will of course be appreciated that in the development of any such actual implementation of the invention, numerous implementation-specific decisions must be made in order to achieve the developer's specific goals, and that these specific goals will vary from one implementation to another and from one developer to another. It will be appreciated that such a development effort might be complex and time-consuming, but would nevertheless be a routine undertaking of engineering for those of ordinary skill in the art having the benefit of this disclosure.

Further, it is to be understood that the phraseology or terminology employed herein is for the purpose of description and not of limitation, such that the terminology or phraseology of the present specification is to be interpreted by the skilled artisan in light of the teachings and guidance presented herein, in combination with the knowledge of one skilled in the relevant art(s). Furthermore, it is not intended that any term in this specification or claims be ascribed an uncommon or special meaning unless explicitly set forth as such.

Various aspects disclosed herein include present and future known equivalents to the known modules referred to herein by way of illustration. Further, while various aspects and applications have been shown and described, it will be apparent to those skilled in the art having the benefit of this disclosure that many more modifications than mentioned above are possible without departing from the inventive concepts disclosed herein.

Claims

1. A computer-implemented method for anonymously collecting malware-related data from a client device, the method comprising:

receiving, by a network node, a first data structure from a client device, wherein the first data structure contains an identifier of the client device and encrypted data, the encrypted data comprising an identifier of a user of the client device and/or personal data of the user, and the encrypted data being encrypted by the client device using a public key of the client device, wherein the public key is provided to the client device by an independent certificate authority;

transforming, by the network node, the received first data structure by replacing the identifier of the client device with an anonymous identifier and sending a transformed first data structure containing the anonymous identifier and the encrypted data to a server;

receiving, by the server, the transformed first data structure from the network node;

receiving, by the server, a second data structure from the client device, wherein the second data structure contains malware-related data obtained on the client device; and

combining, by the server, the transformed first data structure with the second data structure and storing the combined data structure on the server without the server having access to and/or viewing (i) an identifier of the client device and (ii) an identifier of the user of the client device and/or personal data of the user stored in the combined data structure.

2. The method of claim 1, wherein the anonymous identifier comprises an encrypted identifier of the client device.

3. The method of claim 1, wherein the client device is located in a first area network, the network node is located in a second area network different from the first area network, and the server is located in a third area network different from the first area network and the second area network.

4. The method of claim 3, wherein the first regional network and the third regional network are located within different legal jurisdictions.

5. The method of claim 1, wherein the data related to malware comprises a hash of a malicious file.

6. The method of claim 1, wherein the network node is not located in the same intranet as the server and the client device.

7. A system for anonymously collecting malware-related data from a client device, the system comprising:

a network node having a hardware processor configured to:

receiving a first data structure from a client device, wherein the first data structure contains an identifier of the client device and encrypted data, the encrypted data comprising an identifier of a user of the client device and/or personal data of the user, and the encrypted data is encrypted by the client device using a public key of the client device, wherein the public key is provided to the client device by an independent certificate authority;

transforming the received first data structure by replacing the identifier of the client device with an anonymous identifier and sending the transformed first data structure containing the anonymous identifier and the encrypted data to a server; and

a server having a hardware processor configured to:

receiving the transformed first data structure from the network node;

receiving a second data structure from the client device, wherein the second data structure contains data relating to malware obtained on the client device; and

combining the transformed first data structure with the second data structure and storing the combined data structure on the server without the server having access to and/or viewing (i) an identifier of the client device and (ii) an identifier of the user of the client device and/or personal data of the user stored in the combined data structure.

8. The system of claim 7, wherein the anonymous identifier comprises an encrypted identifier of the client device.

9. The system of claim 7, wherein the client device is located in a first area network, the network node is located in a second area network different from the first area network, and the server is located in a third area network different from the first area network and the second area network.

10. The system of claim 9, wherein the first regional network and the third regional network are located within different legal jurisdictions.

11. The system of claim 7, wherein the data related to malware comprises a hash of a malicious file.

12. The system of claim 7, wherein the network node is not located in the same intranet as the server and the client device.

13. A non-transitory computer-readable medium comprising computer-executable instructions for anonymously collecting data related to malware from a client device, the computer-executable instructions comprising instructions for:

14. The non-transitory computer-readable medium of claim 13, wherein the anonymous identifier comprises an encrypted identifier of the client device.

15. The non-transitory computer-readable medium of claim 13, wherein the client device is located in a first area network, the network node is located in a second area network different from the first area network, and the server is located in a third area network different from the first area network and the second area network.

16. The non-transitory computer-readable medium of claim 15, wherein the first regional network and the third regional network are located within different legal jurisdictions.

17. The non-transitory computer-readable medium of claim 13, wherein the data related to malware comprises a hash of a malicious file.

18. The non-transitory computer-readable medium of claim 13, wherein the network node is not located in the same intranet as the server and the client device.