CN110782985A - Feature processing method and related equipment - Google Patents

Feature processing method and related equipment Download PDF

Info

Publication number
CN110782985A
CN110782985A CN201911025094.7A CN201911025094A CN110782985A CN 110782985 A CN110782985 A CN 110782985A CN 201911025094 A CN201911025094 A CN 201911025094A CN 110782985 A CN110782985 A CN 110782985A
Authority
CN
China
Prior art keywords
feature
sub
processed
processing
round
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911025094.7A
Other languages
Chinese (zh)
Other versions
CN110782985B (en
Inventor
钱宇秋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201911025094.7A priority Critical patent/CN110782985B/en
Publication of CN110782985A publication Critical patent/CN110782985A/en
Application granted granted Critical
Publication of CN110782985B publication Critical patent/CN110782985B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2133Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on naturality criteria, e.g. with non-negative factorisation or negative correlation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5066Algorithms for mapping a plurality of inter-dependent sub-tasks onto a plurality of physical CPUs

Abstract

The embodiment of the application discloses a feature processing method and related equipment, wherein after user identifications corresponding to original features sent by different nodes are obtained, the same user identification set can be determined according to the user identifications, and the same user identification set is sent to different nodes. If the node is in the first feature processing mode and receives the first sub-features obtained by each node after the ith round of feature processing aiming at the feature to be processed, the first sub-features can be aligned to generate the first synchronous sub-features aiming at the (i + 1) th round of feature processing, and the first synchronous sub-features are sent to different nodes. The feature to be processed is the feature which is determined by each node from the original feature and corresponds to all users in the same user identification set. The method does not obtain the original characteristics, the characteristics to be processed and all the sub-characteristics after processing of each node from beginning to end, so that the safety of data among different nodes is guaranteed.

Description

Feature processing method and related equipment
Technical Field
The present application relates to the field of data processing, and in particular, to a feature processing method and related device.
Background
In a feature processing mode, features of different nodes need to be decomposed into two or more sub-features, and meanwhile, it is also required to ensure that one of the sub-features obtained by the different nodes after decomposing the features is the same. For example: a plurality of organizations (nodes) have some data (characteristics) about common users or common products, and user images or product images corresponding to the data of the organizations need to be researched by a non-negative matrix decomposition method for recommendation. Or, a plurality of hospitals (corresponding nodes) have diagnosis data of different patients for the same disease, and the hospitals need to research together by using the data (characteristics), and can adopt a non-negative matrix decomposition method to decompose.
For safety, it is not desirable for each node to infer its own characteristics, and it is impossible to ensure that all nodes except self are combined together.
However, typically, knowledge of the characteristics of these nodes is required to implement the decomposition process. Therefore, characteristics between nodes are leaked, and data security between different nodes cannot be guaranteed.
Disclosure of Invention
In order to solve the technical problem, the present application provides a feature processing method and related device, which ensure the security of data between different nodes.
The embodiment of the application discloses the following technical scheme:
in a first aspect, an embodiment of the present application provides a feature processing method, which is applied to a data processing device, and the method includes:
acquiring user identifications corresponding to original features sent by different nodes;
determining the same user identification set according to the user identification, and sending the same user identification set to the different nodes;
if the node is in the first feature processing mode, receiving a first sub-feature obtained by each node after the ith round of feature processing aiming at the feature to be processed, aligning the first sub-feature, generating a first synchronization sub-feature aiming at the (i + 1) th round of feature processing, and sending the first synchronization sub-feature to different nodes, wherein the feature to be processed is the feature of each node determined from the original features and corresponding to all users in the same user identification set.
In a second aspect, an embodiment of the present application provides a feature processing method, which is applied to a node, and the method includes:
receiving the same user identification set sent by the data processing equipment, and determining the characteristics to be processed from the original characteristics according to the same user identification set; the features to be processed correspond to users in the same user identification set;
sending the first sub-characteristics obtained after the ith round of characteristic processing to the data processing equipment;
receiving a first synchronous sub-feature aiming at the i +1 th round of feature processing sent by the data processing equipment, and performing the feature processing process of the i +1 th round;
in the feature processing process of the (i + 1) th round, determining a second sub-feature according to the feature to be processed and the first synchronous sub-feature; and determining a first sub-feature according to the feature to be processed and the second sub-feature.
In a third aspect, an embodiment of the present application provides a feature processing apparatus, where the apparatus includes:
the acquiring unit is used for acquiring user identifications corresponding to original features sent by different nodes;
the first determining unit is used for determining the same user identifier set according to the user identifiers and sending the same user identifier set to the different nodes;
a first generating unit, configured to align first sub-features obtained after an ith round of feature processing for a feature to be processed by each node is received if the node is in a first feature processing mode, and generate a first synchronous sub-feature for an (i + 1) th round of feature processing,
and the first sending unit is used for sending the first synchronization sub-feature to different nodes, and the feature to be processed is the feature which is determined by each node from the original feature and corresponds to all users in the same user identification set.
In a fourth aspect, an embodiment of the present application provides a feature processing apparatus, including:
the receiving unit is used for receiving the same user identification set sent by the data processing equipment and determining the characteristics to be processed from the original characteristics according to the same user identification set; the features to be processed correspond to users in the same user identification set;
the second sending unit is used for sending the first sub-characteristics obtained after the ith round of characteristic processing to the data processing equipment;
the receiving unit is further configured to receive a first synchronization sub-feature, which is sent by the data processing device and is used for the (i + 1) th round of feature processing, and perform the feature processing process of the (i + 1) th round;
a second determining unit, configured to determine a second sub-feature according to the feature to be processed and the first synchronization sub-feature in the feature processing process of the (i + 1) th round; and determining a first sub-feature according to the feature to be processed and the second sub-feature.
In a fifth aspect, an embodiment of the present application provides an apparatus for feature processing, where the apparatus includes a processor and a memory:
the memory is used for storing program codes and transmitting the program codes to the processor;
the processor is configured to perform the feature processing method according to the first aspect, according to instructions in the program code.
In a sixth aspect, an embodiment of the present application provides a computer-readable storage medium for storing program codes, where the program codes are used to execute the feature processing method according to the first aspect.
According to the technical scheme, the method is applied to the data processing equipment, and after the user identifications corresponding to the original features sent by different nodes are obtained, the same user identification set can be determined according to the user identifications, and the same user identification set is sent to different nodes. Namely, the same user identifier set is determined only according to the user identifiers and is sent to different nodes, so that the nodes determine the features to be processed according to the same user identifier set. Next, if the node is in the first feature processing mode and receives the first sub-feature obtained by each node after the ith round of feature processing for the feature to be processed, the first sub-feature may be aligned, and the first synchronous sub-feature for the (i + 1) th round of feature processing is generated and sent to different nodes. The feature to be processed is the feature which is determined by each node from the original feature and corresponds to all users in the same user identification set. That is, only one sub-feature after the self-decomposition of different nodes is received for synchronization. The method does not obtain the original characteristics, the characteristics to be processed and all the sub-characteristics after processing of each node from beginning to end, so that the safety of data among different nodes is guaranteed.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.
Fig. 1 is a schematic view of an application scenario of a feature processing method according to an embodiment of the present application;
fig. 2 is a signaling interaction diagram of a feature processing method according to an embodiment of the present application;
fig. 3 is a signaling interaction diagram of another data processing method diagram provided in an embodiment of the present application;
fig. 4 is a signaling interaction diagram of another data processing method diagram provided in an embodiment of the present application;
FIG. 5 is a block diagram illustrating an overall architecture of a synchronization technique for non-negative matrix factorization of different tissues according to an embodiment of the present disclosure;
FIG. 6 is a block diagram of an overall architecture of an asynchronous technique for non-negative matrix factorization of different tissues according to an embodiment of the present disclosure;
fig. 7 is a flowchart of a feature processing method according to an embodiment of the present application;
FIG. 8 is a graph of relative error versus feature processing runs for a synchronization technique according to an embodiment of the present application;
FIG. 9 is a graph of relative error versus feature processing turns for asynchronous techniques according to an embodiment of the present application;
fig. 10 is a structural diagram of a feature processing apparatus according to an embodiment of the present application;
fig. 11 is a structural diagram of a feature processing apparatus according to an embodiment of the present application;
fig. 12 is a feature processing apparatus provided in an embodiment of the present application;
fig. 13 is a block diagram of a server according to an embodiment of the present application.
Detailed Description
Embodiments of the present application are described below with reference to the accompanying drawings.
Currently, when performing feature processing for different nodes (e.g., decomposing data of multiple organizations by a non-negative matrix decomposition technique), it is usually necessary to know the features of these nodes to implement the decomposition process. Therefore, data security between different nodes cannot be guaranteed.
Therefore, the embodiment of the present application provides a feature processing method, so that it is expected that in the process of processing features of different nodes, all pre-processing features and all post-processing sub-features of the nodes are not known, so as to ensure the security of data between different nodes.
First, an application scenario of the embodiment of the present application is described. The characteristic processing method provided by the application can be applied to data processing equipment, such as terminal equipment and a server. The terminal device may be, for example, a smart terminal, a computer, a Personal Digital Assistant (PDA), a tablet computer, or the like. The feature processing method can also be applied to servers, and the servers can be independent servers or servers in a cluster.
In order to facilitate understanding of the technical solution of the present application, the following describes a feature processing method provided in the embodiments of the present application with reference to an actual application scenario.
Referring to fig. 1, fig. 1 is a schematic view of an application scenario of a feature processing method according to an embodiment of the present application. As shown in fig. 1, a feature processing system is included, and a server 101 in the feature processing system can be used as a data processing device to execute the feature processing method provided in the embodiment of the present application. Also included in the system are server 102 and server 103, which are devices of the node. The feature processing system may be a distributed system formed by connecting in the form of network communications. The distributed system is, for example, a blockchain, and both the server 101 as a data processing device and the servers 102 and 103 as node devices may be a node in the blockchain. The nodes in the blockchain (any form of computing device in the access network, such as a server and a user terminal) may be cloud servers or cloud terminals.
The blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism and an encryption algorithm. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product services layer, and an application services layer.
In the embodiment of the present application, the nodes of the server 102 and the server 103 may have original features. The original feature of a node may correspond to at least two dimensions, and the original feature may be decomposed into at least two sub-features, one of which may characterize its dimension. For example, the original feature of the node may be original data of an organization (corresponding to a matrix M) corresponding to two dimensions of users and products, each row of the matrix M may represent each user, and each column may represent the number of purchases made by the user for each product. The matrix M may be decomposed to obtain two sub-matrices (sub-features) U and V, where the matrix U may represent sub-features (user portrait) of a user dimension, and each row of elements in the matrix U may be a low-dimensional representation of a user. The matrix V may represent a sub-feature of a product dimension (product representation), and each row element in the matrix V may be a low-dimensional representation of a product.
The server 102 and the server 103 may determine, according to the original features, user identifiers corresponding to the original features, and send the user identifiers corresponding to the original features to the server 101, where the user identifiers serve to identify users corresponding to the user identifiers according to the user identifiers.
The server 101 may determine, according to the user identifiers sent by different nodes, the user identifier that each node has, as the same user identifier set, and send the same user identifier set to the server 102, the server 103, and the like described above. In this way, both server 102 and server 103 may determine, from their original features, features corresponding to all users in the same user identification set as their features to be processed.
The server 101 corresponds to a first feature processing mode, where the first feature processing mode corresponds to a case where all nodes process their own features to be processed synchronously, and the first feature processing mode may be understood as a mode in which sub-features sent by all nodes after one round of feature processing are aligned after the sub-features are received.
If the server 101 is in the first feature processing mode, and the first sub-feature obtained after the i-th round of feature processing is performed on the to-be-processed feature by the nodes such as the server 102 and the server 103 is received, the server 101 may align the first sub-feature to generate a first synchronization sub-feature, and send the first synchronization sub-feature to different nodes to serve as synchronization sub-features when the different nodes perform data processing on the (i + 1) -th round.
The above-described characteristic processing method will be described below by way of example. It is assumed that the data processing system comprises a data processing device 1, a node and a node 2. The original features X of the node 1 correspond to the user a and the user B (user identifications correspond to a and B, respectively), the original features Y of the node 2 correspond to the user a, the user B and the user C (user identifications correspond to a, B and C, respectively), and the node 1 and the node 2 transmit the user identifications corresponding to their original features to the data processing apparatus 1. The data processing device 1 may determine the same user identifier set according to the user identifiers, and send the determined same user identifier set to the node 1 and the node 2. It will be appreciated that the same set of user identities determined comprises a and b.
Both node 1 and node 2 can determine the features corresponding to user a and user B from their original features according to the same user identification set, and use them as the features to be processed. For example: the to-be-processed feature X 'determined by the node 1 corresponds to the user A and the user B, the to-be-processed feature X' is still the original feature X, the to-be-processed feature Y 'determined by the node 2 corresponds to the user A and the user B, and compared with the original feature Y, the to-be-processed feature Y' does not include a partial feature corresponding to the user C.
When the data processing apparatus 1 is in the first feature processing mode, the first sub-features x1 and y1 obtained by the node 1 and the node 2 after the ith round of feature processing for the feature to be processed are received, respectively, then the first sub-features x1 and y1 may be aligned, a first synchronization sub-feature z is generated, and the first synchronization sub-feature z is sent to the node 1 and the node 2, respectively. This z is used as a synchronization sub-feature for the different nodes in the i +1 th round of data processing.
In the method, the same user identification set is determined only according to the user identification and is sent to different nodes, so that the nodes determine the characteristics to be processed according to the same user identification set. And only one sub-feature after the self decomposition of different nodes is received for synchronization. The original characteristics, the characteristics to be processed and all the sub-characteristics after processing of each node are not obtained all the time, so that the safety of data among different nodes is guaranteed.
Referring to fig. 2, this figure shows a signaling interaction diagram of a feature processing method provided in an embodiment of the present application, where the method includes:
s201: the node 1 sends the user identification corresponding to the original characteristic to the data processing device.
S202: the node 2 sends the user identification corresponding to its original characteristic to the data processing device.
In a specific implementation, different nodes may perform mapping in the same manner on a user identity card identification number (ID) corresponding to the original feature to obtain a user identity corresponding to the original feature. That is to say, the user identifiers corresponding to the original features may be encrypted data, and since the mapping modes between different nodes are the same, if the user identifiers are the same, it indicates that the users corresponding to the user identifiers are also the same.
It should be noted that, in the embodiment of the present application, the precedence relationship between the S201 and the S202 is not limited, and the S201 may be executed first, and then the S202 may be executed; alternatively, S202 may be performed first, and then S201 may be performed; or, S201 and S202 are performed simultaneously.
S203: and the data processing equipment acquires user identifications corresponding to the original features sent by different nodes.
S204: and the data processing equipment determines the same user identification set according to the user identification.
S205: the data processing apparatus sends the same set of user identifications to node 1.
S206: the data processing apparatus sends the same set of user identities to node 2.
It should be noted that, in the embodiment of the present application, the precedence relationship between the S205 and the S206 is not limited, and the S205 may be executed first, and then the S206 may be executed; alternatively, S206 may be performed first, and then S205 may be performed; or, S205 and S206 are performed simultaneously.
S207: the node 1 receives the same user identification set sent by the data processing equipment, and determines the feature to be processed from the original feature according to the same user identification set.
S208: and the node 2 receives the same user identification set sent by the data processing equipment, and determines the characteristics to be processed from the original characteristics according to the same user identification set.
Since the original features of the node may include other users besides the user in the same user identification set, the node may determine the feature to be processed from the original features of the node according to the user identification set. In this way, the features to be processed of different nodes are all corresponding to the same user.
It should be noted that, in the embodiment of the present application, the precedence relationship between the S207 and the S208 is not limited, and the S207 may be executed first, and then the S208 may be executed; alternatively, S208 may be performed first, and then S207 may be performed; or, S207 and S208 are performed simultaneously.
S209: the node 1 determines a second sub-feature according to the feature to be processed and the first synchronization sub-feature; and determining the first sub-feature according to the feature to be processed and the second sub-feature.
S2010: the node 1 determines a second sub-feature according to the feature to be processed and the first synchronization sub-feature; and determining the first sub-feature according to the feature to be processed and the second sub-feature.
In the embodiment of the application, the node processes the to-be-processed characteristics of the node by itself so as to ensure the data security of the to-be-processed characteristics of the node.
In the feature processing system, the node can perform multi-round processing on the feature to be processed, and in the ith round of processing on the processing feature, the node can decompose the second sub-feature according to the feature to be processed and the first synchronous sub-feature and determine the first sub-feature according to the feature to be processed and the second sub-feature.
It should be noted that, when each node performs the first round of feature processing, the first sub-feature and the second sub-feature may be initialized according to the feature to be processed, and the initialized first sub-feature and second sub-feature may be used as the first sub-feature and the second sub-feature obtained after the first round of feature processing.
It should be noted that, in the embodiment of the present application, the precedence relationship between the S209 and the S2010 is not limited, and the S209 may be executed first, and then the S2010 may be executed; alternatively, S2010 may be executed first, and then S209 may be executed; or, S209 and S2010 are executed simultaneously.
S2011: and the node 1 sends the first sub-characteristics obtained after the ith round of characteristic processing to the data processing equipment.
S2012: and the node 2 sends the first sub-characteristics obtained after the ith round of characteristic processing to the data processing equipment.
It should be noted that, in the embodiment of the present application, the precedence relationship between S2011 and S2012 is not limited, and S2011 may be executed first, and then S2012 is executed; alternatively, S2012 may be executed first, and then S2011 may be executed; or, S2011 and S2012 are executed simultaneously.
S2013: and the data processing equipment receives the first sub-features obtained by each node after the ith round of feature processing aiming at the features to be processed, aligns the first sub-features and generates first synchronous sub-features aiming at the (i + 1) th round of feature processing.
In a specific implementation, the manner may be an averaging manner, that is, the data processing device may perform an average calculation on the first sub-feature sent by each node to obtain an average sub-feature, and use the average sub-feature as the first synchronization sub-feature.
S2014: the data processing device sends the first sub-feature to node 1.
S2015: the data processing device sends the first sub-feature to node 1.
It should be noted that, in the embodiment of the present application, the precedence relationship between S2014 and S2015 is not limited, and S2014 and S2015 may be executed first; alternatively, S2015 may be performed first, and then S2014 may be performed; or, S2014 and S2015 are simultaneously performed.
S2016: and the node 1 receives a first synchronization sub-feature aiming at the i +1 th round of feature processing sent by the data processing equipment, and performs the feature processing process of the i +1 th round according to the first synchronization sub-feature.
S2017: and the node 2 receives a first synchronization sub-feature aiming at the i +1 th round of feature processing sent by the data processing equipment, and performs the feature processing process of the i +1 th round according to the first synchronization sub-feature.
It should be noted that, in the embodiment of the present application, the precedence relationship between the S2016 and the S2017 is not limited, and the S2016 and then the S2017 may be executed first; alternatively, S2017 may be performed first, and then S2016 may be performed; or, S2016 and S2017 are executed simultaneously.
In this embodiment, the data processing apparatus may stop executing the method when the synchronization of the first sub-feature of one round is completed (that is, the first synchronization sub-feature of the current round is obtained according to the first sub-feature obtained by the node in the previous round, which corresponds to S2013). Thus, in S2014 and S2015, each node may use the currently obtained first synchronization sub-feature as the first sub-feature obtained by processing the feature to be processed, and use the second sub-feature obtained in the previous round as the second sub-feature obtained by processing the feature to be processed. Thus, the features to be processed of different nodes are processed into a first sub-feature, which is the same for each node, and a second sub-feature, which may be different for each node.
In a specific implementation, the to-be-processed feature of each node may be matrix data Mi corresponding to two dimensions, namely, a user and a product, of different organizations, and the matrix data Mi may be processed in a non-negative matrix decomposition manner, so that a user portrait matrix Ui and a product portrait matrix Vi may be obtained. Wherein, the process of carrying out nonnegative matrix decomposition on the matrix data Mi corresponds to the formula Mi ≈ Ui T·Vi。
The user portrait matrix can be used as a first sub-feature, and the product portrait matrix can be used as a second sub-feature; thus, by executing the feature processing method provided by the embodiment of the present application, the user portrait matrices can be synchronized with respect to the matrix data Mi of different organizations (i.e., the matrix data Mi of different organizations are decomposed into the same user portrait matrix). Alternatively, the product portrait matrix may be used as the first sub-feature, and the user portrait matrix may be used as the second sub-feature, and the feature processing method provided in the embodiment of the present application may be executed to synchronize the product portrait matrices with respect to the matrix data Mi of different organizations (i.e., decompose the matrix data Mi of different organizations into the same product portrait matrix).
According to the technical scheme, the method is applied to the data processing equipment, and after the user identifications corresponding to the original features sent by different nodes are obtained, the same user identification set can be determined according to the user identifications, and the same user identification set is sent to different nodes. Namely, the same user identifier set is determined only according to the user identifiers and is sent to different nodes, so that the nodes determine the features to be processed according to the same user identifier set. Next, if the node is in the first feature processing mode and receives the first sub-feature obtained by each node after the ith round of feature processing for the feature to be processed, the first sub-feature may be aligned, and the first synchronous sub-feature for the (i + 1) th round of feature processing is generated and sent to different nodes. The feature to be processed is the feature which is determined by each node from the original feature and corresponds to all users in the same user identification set. That is, only one sub-feature after the self-decomposition of different nodes is received for synchronization. The method does not obtain the original characteristics, the characteristics to be processed and all the sub-characteristics after processing of each node from beginning to end, so that the safety of data among different nodes is guaranteed.
It is understood that situations may arise where it is difficult for different nodes to perform each round of pending feature processing simultaneously. Thus, the feature processing cannot be performed by the methods of S201 to S2017 described above. To this end, in one possible implementation, the embodiment of the present application provides a second feature processing mode for the data processing device. The second feature processing mode may be understood that different nodes may not perform each round of feature to be processed at the same time, and when a node performs a round of feature to be processed and sends a first sub-feature obtained by the ith round of feature processing to the data processing device, the data processing device may synchronize the node with respect to the first sub-feature.
Thus, the method further comprises:
s2018: if the data processing device is in the second feature processing mode and receives a first sub-feature obtained by a node after the ith round of feature processing for the feature to be processed, the data processing device may generate a first synchronization sub-feature for the (i + 1) th round of feature processing according to the first reference sub-feature and the first sub-feature during the ith round of feature processing.
Wherein the first reference sub-feature may be a reference sub-feature stored in the data processing device.
S2019: the data processing device sends the first synchronization sub-feature to the node.
Therefore, different nodes can be guaranteed not to perform each round of data processing process at the same time.
In one possible implementation, the method of S2018 may include:
s301: and setting a reference weight corresponding to the first reference sub-feature and a first weight corresponding to the first sub-feature during the ith round of feature processing.
In the embodiment of the present application, the data processing apparatus may set a corresponding reference weight for the first reference sub-feature at the time of the ith round of feature processing, and determine a corresponding first weight for the first sub-feature of one node received by the data processing apparatus, so that the sum of the reference weight and the first weight is 1. Further, the data processing apparatus should ensure that the reference weight set at round i +1 is higher than the reference weight set at round i when setting the reference weight.
S302: and generating a first synchronization sub-feature according to the first reference sub-feature and the corresponding reference weight and the first sub-feature and the corresponding first weight.
Thus, the data processing device may generate the first synchronization sub-feature from the first baseline sub-feature and the corresponding baseline weight, and the first sub-feature and the corresponding first weight. For example: after the product of the first reference sub-feature and the corresponding reference weight and the product of the first sub-feature and the corresponding first weight are obtained, the sum of the two products is used as the first synchronization sub-feature.
S303: and updating the first reference sub-feature according to the first synchronization sub-feature, and taking the updated first reference sub-feature as the first reference sub-feature of the (i + 1) th round.
After obtaining the first synchronization sub-feature for a node in S302, the data processing apparatus may update the first reference sub-feature according to the first synchronization sub-feature, and use the updated first reference sub-feature as the first reference sub-feature of the (i + 1) th round. For example: the first synchronization sub-feature is taken as the first reference sub-feature of the (i + 1) th round.
By the mode of continuously improving the reference weight corresponding to the first reference sub-feature in turn, the first synchronization sub-features obtained by different nodes are continuously close to each other, and the synchronization accuracy of the first sub-features of different nodes is improved.
In the embodiment of the present application, if the to-be-processed feature obtained in S207 and S208 is a matrix, the first sub-feature in S209 and S2010 is a sub-matrix obtained by performing non-negative matrix factorization on the to-be-processed feature, and the data processing apparatus is in the first processing mode. In order to improve the processing efficiency of the node in S209 and S2010 for performing the feature to be processed, in a possible implementation manner, refer to fig. 3, which shows a signaling interaction diagram of another data processing method diagram provided in the embodiment of the present application, where the method may further include:
s401: the data processing device sends the first parameters and/or the second parameters for the (i + 1) th round of feature processing to the node 1.
S402: the data processing device sends the first parameters and/or the second parameters for the (i + 1) th round of feature processing to the node 2.
The first parameters are used for generating a first random matrix, and the second parameters are used for generating a second random matrix.
It should be noted that, in the embodiment of the present application, the precedence relationship between the S401 and the S402 is not limited, and the S401 may be executed first, and then the S402 may be executed; alternatively, S402 may be performed first, and then S401 may be performed; or, S401 and S402 are executed simultaneously.
In a possible implementation manner, if the node 1 and the node 2 receive the first parameter, the method further includes:
s403: the node 1 receives the first parameter and generates a first random matrix according to the first parameter.
S404: the node 2 receives the first parameter and generates a first random matrix according to the first parameter.
For convenience of description, the first random matrix will be denoted by a matrix S1, and the second random matrix will be denoted by a matrix S2.
In the embodiment of the present application, the number of rows of the first random matrix and the second random matrix is less than the number of rows of the features to be processed, or the number of columns of the first random matrix and the second random matrix is less than the number of columns of the features to be processed. In addition, when determining the first random matrix and the second random matrix, it should be ensured that the product of the random matrix and the transpose of the random matrix is expected to be a diagonal matrix with the main diagonal elements all being 1, that is:
(S1·S1 T) Diagonal matrix with 1 main diagonal element
(S2·S2 T) Diagonal matrix with 1 main diagonal element
It should be noted that the first parameter and the second parameter sent by the data processing apparatus to each node are the same.
It should be noted that, in the embodiment of the present application, the precedence relationship between the S403 and the S404 is not limited, and the S403 may be executed first, and then the S404 may be executed; alternatively, S404 may be performed first, and then S403 may be performed; or, S403 and S404 are executed simultaneously.
Then, the method of S209 may include:
s405: and determining a second sub-feature according to the feature to be processed, the first synchronous sub-feature and the first random matrix.
The method of S2010 may include:
s406: and determining a second sub-feature according to the feature to be processed, the first synchronous sub-feature and the first random matrix.
For convenience of description, the feature to be processed will be represented by a matrix Mi, the first synchronization sub-feature by a matrix U, the first sub-feature by a matrix Ui, and the second sub-feature by a matrix Vi.
In a specific implementation, the method for the node to determine the matrix Vi (the second sub-feature) according to the matrix Mi (the feature to be processed), the matrix U (the first synchronous sub-feature) and the matrix S1 (the first random matrix) may include: calculating the product S1Mi of the matrix Mi and the matrix S1, calculating the product S1U of the matrix S1 and the matrix U, and according to the formula S1Mi ≈ S1U TVi, resulting in the matrix Vi.
In a possible implementation manner, if the node 1 and the node 2 receive the second parameter, the method further includes:
s407: and the node 1 receives the second parameter and generates a second random matrix according to the second parameter.
S408: the node 2 receives the second parameter and generates a second random matrix according to the second parameter.
It should be noted that, in the embodiment of the present application, the precedence relationship between the S407 and the S408 is not limited, and the S407 may be executed first, and then the S408 is executed; alternatively, S408 may be performed first, and then S407 may be performed; or, S408 and S407 are performed simultaneously.
The method of S209 may include:
s409: and determining the first sub-feature according to the feature to be processed, the second sub-feature and the second random matrix.
The method of S2010 may include:
s4010: and determining the first sub-feature according to the feature to be processed, the second sub-feature and the second random matrix.
In a specific implementation, the method for the node to determine the matrix Ui (the first sub-feature) according to the matrix Mi (the feature to be processed), the matrix Vi (the second sub-feature) and the matrix S2 (the second random matrix) may include: calculating a product MiS2 of the matrix Mi and the matrix S2, calculating a product Vis2 of the matrix S2 and the matrix Vi, and according to a formula MiS2 ≈ Ui TVis2, resulting in the matrix Ui.
The node is provided with the random matrix with the number of rows or columns smaller than the feature to be processed, so that the dimension reduction calculation is carried out on the feature to be processed, and the calculation efficiency of the node on the feature to be processed is improved.
In addition, if the feature to be processed derived in S207 and S208 is a matrix, the first sub-feature in S209 and S2010 is a sub-matrix obtained by performing non-negative matrix factorization on the feature to be processed. In order to improve the processing efficiency of the node performing the feature to be processed in S209 and S2010, in a possible implementation manner, referring to fig. 4, the method shows a signaling interaction diagram of another data processing method diagram provided in the embodiment of the present application, where the method may further include:
s501: the data processing apparatus transmits the second parameter for the i +1 th round of feature processing to the node 1.
S502: the data processing apparatus sends the second parameter for the i +1 th round of feature processing to the node 2.
The first parameters are used for generating a first random matrix, and the second parameters are used for generating a second random matrix.
The method descriptions of S501-S502 are as described above in S401-S402, and are not repeated here.
It should be noted that, in the embodiment of the present application, the precedence relationship between the S501 and the S502 is not limited, and the S501 may be executed first, and then the S502 may be executed; alternatively, S502 may be performed first, and then S501 may be performed; or, S501 and S502 are executed simultaneously.
In a possible implementation manner, if the node 1 and the node 2 receive the second parameter, the method further includes:
s503: and the node 1 receives the second parameter and generates a second random matrix according to the second parameter.
S504: the node 2 receives the second parameter and generates a second random matrix according to the second parameter.
The method descriptions of S503-S504 are as described above in S403-S404 and are not repeated here.
It should be noted that, in the embodiment of the present application, the precedence relationship between the S503 and the S504 is not limited, and the S503 may be executed first, and then the S504 may be executed; alternatively, S504 may be performed first, and then S503 may be performed; or, S503 and S504 are executed simultaneously
The method of S209 may include:
s505: and determining the first sub-feature according to the feature to be processed, the second sub-feature and the second random matrix.
The method of S2010 may include:
s506: and determining the first sub-feature according to the feature to be processed, the second sub-feature and the second random matrix.
The method descriptions of S505-S506 are as described above in S409-S4010 and are not repeated here.
The node is provided with the random matrix with the number of rows or columns smaller than the feature to be processed, so that the dimension reduction calculation is carried out on the feature to be processed, and the calculation efficiency of the node on the feature to be processed is improved.
Next, the feature processing method provided in the embodiment of the present application will be described with reference to an actual application scenario.
When multiple organizations employ non-negative matrix factorization techniques together on multiple parties of data, the data often cannot be leaked from one another for data security reasons. Currently, many researches on secure computing technologies include secure gradient descent technology, secure Singular Value Decomposition (SVD) Decomposition, and the like. Due to the differences in the research problems of these studies, it cannot be directly applied to the distributed non-negative matrix problem of security.
In addition, distributed non-negative matrix factorization techniques have been researched more, such as Map-Reduce, Spark, X10, GPU, MPI, and the like, and since these researches do not consider the security problem of distributed non-negative matrix factorization, the factorization process may cause data leakage among various organizations. The problem can also be solved by improving a distributed non-negative matrix technology in the related technology, but because the existing algorithms perform multi-backup of data and are used for improving the efficiency of the algorithms, the safety of the data cannot be ensured.
Only the DSANLS can be improved to ensure the data security in a certain data processing round. However, since the non-negative matrix factorization technology is an iterative algorithm, improving the existing non-negative matrix factorization technology DSANLS cannot guarantee the data security of the whole solving process.
For this reason, the embodiment of the present application provides a feature processing method, when performing non-negative matrix factorization on the to-be-processed features Mi corresponding to the same user for different organizations, refer to fig. 5, which shows an overall architecture diagram of a synchronization technique for performing non-negative matrix factorization on different organizations provided by the embodiment of the present application. In the embodiment of the present application, the synchronization technique corresponds to the first feature processing mode.
As shown in fig. 5, the data processing apparatus is in the first feature processing mode, and after receiving the user identifiers corresponding to the original features sent by different organizations (organization 1, organization 2, …, organization n), determines the data M1, data M2, …, and data Mn (corresponding to the feature Mi to be processed) of common users (corresponding to the same user identifier set) and products. As each organization performs a first round of feature processing, its own user representation Ui (corresponding to the first sub-feature) and product representation Vi (corresponding to the second sub-feature) may be initialized. The data processing device may then align (average) the user representation Ui for each of these organizational feature processing runs to obtain a synchronized user representation U (corresponding to the first synchronization sub-feature), and each organization may perform the next feature processing run based on the synchronized user representation U to decompose the user representation Ui and the product representation Vi. The iteration may be stopped when it is determined that the user representation Ui of each tissue is appropriate.
Referring to fig. 6, a diagram illustrating an overall architecture of an asynchronous technique for non-negative matrix factorization of different tissues according to an embodiment of the present application is shown. Wherein the asynchronous technique corresponds to a first feature processing mode.
As shown in fig. 6, the data processing apparatus is in the second feature processing mode, and after receiving the user identifiers corresponding to the original features sent by different organizations (organization 1, organization 2, …, organization n), determines the data M1, data M2, …, and data Mn (corresponding to the feature Mi to be processed) of the common user (corresponding to the same user identifier set) and product. As each organization performs a first round of feature processing, its own user representation Ui (corresponding to the first sub-feature) and product representation Vi (corresponding to the second sub-feature) may be initialized. When the data processing device receives a user representation Ui sent by an organization, it may align the user representation Ui according to its first reference sub-feature to generate a next round of synchronized user representation U (corresponding to the first synchronization sub-feature), and then the organization may perform a next round of feature processing according to the synchronized user representation U to decompose the user representation Ui and the product representation Vi. The iteration may be stopped when it is determined that the user representation Ui of each tissue is appropriate.
The existing research of non-negative matrix factorization technology is mainly based on safe multi-party matrix calculation, including how to safely carry out matrix addition and matrix multiplication among a plurality of organizations. The study of this type of problem can be applied to the secure distributed non-negative matrix problem, but the efficiency can be very low. Especially, when the data size is large and the non-negative matrix is decomposed into an iterative algorithm, the calculation time is greatly increased, so that the practical problem cannot be solved.
Therefore, the embodiment of the application provides a feature processing method to improve the efficiency of feature processing of tissues. Referring to fig. 7, which shows a flowchart of a feature processing method provided in an embodiment of the present application, after the first random matrix S1 and the second random matrix S2 are organized and determined, feature processing may be performed according to the first random matrix S1 and the second random matrix S2. For the synchronized user image U and the product image Vi obtained from the previous round of decomposition, the product image Vi of the current round may be determined by the first random matrix S1, the data Mi and the synchronized user image U, and then the user image Ui of the current round may be determined by the second random matrix S2, the data Mi and the determined product image Vi of the current round. And reducing the dimension of the data Mi through a random matrix so as to reduce the calculation amount of feature processing.
Referring to fig. 8, the graph shows a relative error versus feature processing round distribution diagram for the synchronization technique provided by the embodiment of the present application, as shown in fig. 8, the four graphs are respectively a relative error versus feature processing round distribution diagram of four data to be processed (uniform sample 1, sample 2, sample 3, and sample 4), where the relative error may refer to a difference between Mi and UiVi. Therefore, for uniform data to be processed, the characteristic processing by the synchronization technology can better ensure smaller relative error, namely, the mode of applying the synchronization technology to carry out the characteristic processing can achieve better effect.
Referring to fig. 9, a graph of a relative error versus feature processing round profile for an asynchronous technique provided by an embodiment of the present application is shown, as shown in fig. 9, where the four graphs are the relative error versus feature processing round profiles of four data to be processed (non-uniform sample 1, sample 2, sample 3, and sample 4). Therefore, for uneven data to be processed, the characteristic processing by the asynchronous technology can better ensure smaller relative error, namely, the characteristic processing mode by the asynchronous technology can achieve better effect.
Based on the feature processing method provided in the foregoing, an embodiment of the present application further provides a feature processing apparatus, as shown in fig. 10, which shows a structure diagram of the feature processing apparatus provided in the embodiment of the present application, and the apparatus includes
An obtaining unit 1001, configured to obtain user identifiers corresponding to original features sent by different nodes;
a first determining unit 1002, configured to determine a same user identifier set according to the user identifier, and send the same user identifier set to the different nodes;
a first generating unit 1003, configured to, if the node is in the first feature processing mode and receives a first sub-feature obtained after the ith round of feature processing for the feature to be processed by each node, align the first sub-feature and generate a first synchronization sub-feature for the (i + 1) th round of feature processing;
a first sending unit 1004, configured to send the first synchronization sub-feature to different nodes, where the feature to be processed is a feature that is determined by each node from original features and corresponds to all users in the same user identifier set.
In a possible implementation manner, the first generating unit 1003 is further configured to:
if the node is in the second feature processing mode, receiving a first sub-feature obtained by the node after the ith round of feature processing aiming at the feature to be processed, generating a first synchronization sub-feature aiming at the (i + 1) th round of feature processing according to the first reference sub-feature and the first sub-feature during the ith round of feature processing, and sending the first synchronization sub-feature to the node.
In a possible implementation manner, the first generating unit 1003 is specifically configured to:
setting a reference weight corresponding to the first reference sub-feature and a first weight corresponding to the first sub-feature during the ith round of feature processing, wherein the reference weight set in the ith round is higher than the reference weight set in the (i-1) th round;
generating the first synchronization sub-feature according to the first reference sub-feature and the corresponding reference weight, and the first sub-feature and the corresponding first weight;
and updating the first reference sub-feature according to the first synchronization sub-feature, and taking the updated first reference sub-feature as the first reference sub-feature of the (i + 1) th round.
In a possible implementation manner, the first sending unit 1004 is further configured to:
if the feature to be processed is a matrix, the first sub-feature is a sub-matrix obtained by performing non-negative matrix decomposition on the feature to be processed,
sending first parameters and/or second parameters aiming at the i +1 th round of feature processing to each node, wherein the first parameters are used for generating a first random matrix, and the second parameters are used for generating a second random matrix; the number of rows of the first random matrix and the second random matrix is smaller than the number of rows of the features to be processed, and the number of columns of the first random matrix and the second random matrix is smaller than the number of columns of the features to be processed.
In a possible implementation manner, the first sending unit 1004 is further configured to:
if the feature to be processed is a matrix, the first sub-feature is a sub-matrix obtained by performing non-negative feature decomposition on the feature to be processed, and the method further includes:
sending a second parameter aiming at the i +1 th round of feature processing to the node, wherein the second parameter is used for generating a second random matrix;
and the row number of the second random matrix is less than the row number of the features to be processed, or the column number of the second random matrix is less than the column number of the features to be processed.
An embodiment of the present application further provides a feature processing apparatus, as shown in fig. 11, which shows a structure diagram of the feature processing apparatus provided in the embodiment of the present application, where the apparatus includes:
a receiving unit 1101, configured to receive a same user identifier set sent by a data processing device, and determine a feature to be processed from the original feature according to the same user identifier set; the features to be processed correspond to users in the same user identification set;
a second sending unit 1102, configured to send the first sub-feature obtained after the ith round of feature processing to the data processing device;
the receiving unit 1101 is further configured to receive a first synchronization sub-feature for the i +1 th round of feature processing sent by the data processing device, and perform a feature processing process for the i +1 th round;
a second determining unit 1103, configured to determine a second sub-feature according to the to-be-processed feature and the first synchronization sub-feature in the feature processing process of the (i + 1) th round; and determining a first sub-feature according to the feature to be processed and the second sub-feature.
In a possible implementation manner, the second determining unit 1103 is further configured to:
if the feature to be processed is a matrix, the first sub-feature and the second sub-feature are sub-matrices obtained by performing non-negative matrix decomposition on the feature to be processed,
if a first parameter aiming at the (i + 1) th round of feature processing sent by the data processing equipment is received, generating a first random matrix according to the first parameter, wherein the row number of the first random matrix is less than the row number of the features to be processed, or the column number of the first random matrix is less than the column number of the features to be processed;
and determining a second sub-feature according to the feature to be processed, the first synchronous sub-feature and the first random matrix.
In a possible implementation manner, the second determining unit 1103 is further configured to:
if the feature to be processed is a matrix, the first sub-feature and the second sub-feature are sub-matrices obtained by performing non-negative matrix decomposition on the feature to be processed, and the method further includes:
if a second parameter aiming at the (i + 1) th round of feature processing sent by the data processing equipment is received, generating a second random matrix according to the second parameter, wherein the row number of the second random matrix is smaller than the row number of the features to be processed, or the column number of the second random matrix is smaller than the column number of the features to be processed;
and determining a first sub-feature according to the feature to be processed, the second sub-feature and the second random matrix.
According to the technical scheme, the method is applied to the data processing equipment, and after the user identifications corresponding to the original features sent by different nodes are obtained, the same user identification set can be determined according to the user identifications, and the same user identification set is sent to different nodes. Namely, the same user identifier set is determined only according to the user identifiers and is sent to different nodes, so that the nodes determine the features to be processed according to the same user identifier set. Next, if the node is in the first feature processing mode and receives the first sub-feature obtained by each node after the ith round of feature processing for the feature to be processed, the first sub-feature may be aligned, and the first synchronous sub-feature for the (i + 1) th round of feature processing is generated and sent to different nodes. The feature to be processed is the feature which is determined by each node from the original feature and corresponds to all users in the same user identification set. That is, only one sub-feature after the self-decomposition of different nodes is received for synchronization. The method does not obtain the original characteristics, the characteristics to be processed and all the sub-characteristics after processing of each node from beginning to end, so that the safety of data among different nodes is guaranteed.
The embodiment of the application also provides a feature processing device, which is described below with reference to the accompanying drawings. Referring to fig. 12, an embodiment of the present application provides a feature processing device 1400, where the device 1400 may also be a terminal device, and the terminal device may be any intelligent terminal including a mobile phone, a tablet computer, a Personal Digital Assistant (PDA), a Point of Sales (POS), a vehicle-mounted computer, and the terminal device is a mobile phone:
fig. 12 is a block diagram illustrating a partial structure of a mobile phone related to a terminal device provided in an embodiment of the present application. Referring to fig. 12, the cellular phone includes: radio Frequency (RF) circuit 1410, memory 1420, input unit 1430, display unit 1440, sensor 1450, audio circuit 1460, wireless fidelity (WiFi) module 1470, processor 1480, and power supply 1490. Those skilled in the art will appreciate that the handset configuration shown in fig. 12 is not intended to be limiting and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.
The following describes each component of the mobile phone in detail with reference to fig. 12:
RF circuit 1410 may be used for receiving and transmitting signals during a message transmission or call, and in particular, for processing received downlink information of a base station to processor 1480; in addition, the data for designing uplink is transmitted to the base station. In general, RF circuit 1410 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a Low Noise Amplifier (LNA), a duplexer, and the like. In addition, the RF circuitry 1410 may also communicate with networks and other devices via wireless communications. The wireless communication may use any communication standard or protocol, including but not limited to Global System for Mobile communication (GSM), General Packet Radio Service (GPRS), Code Division Multiple Access (CDMA), Wideband Code Division Multiple Access (WCDMA), Long Term Evolution (LTE), email, Short Message Service (SMS), and the like.
The memory 1420 may be used to store software programs and modules, and the processor 1480 executes various functional applications and data processing of the cellular phone by operating the software programs and modules stored in the memory 1420. The memory 1420 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required for at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the cellular phone, and the like. Further, memory 1420 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device.
The input unit 1430 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the cellular phone. In particular, the input unit 1430 may include a touch panel 1431 and other input devices 1432. The touch panel 1431, also referred to as a touch screen, may collect touch operations performed by a user on or near the touch panel 1431 (for example, operations performed by the user on or near the touch panel 1431 by using any suitable object or accessory such as a finger or a stylus pen), and drive the corresponding connection device according to a preset program. Alternatively, the touch panel 1431 may include two parts of a touch detection device and a touch controller. The touch detection device detects the touch direction of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch sensing device and converts it to touch point coordinates, which are provided to the processor 1480 and can receive and execute commands from the processor 1480. In addition, the touch panel 1431 may be implemented by various types, such as a resistive type, a capacitive type, an infrared ray, and a surface acoustic wave. In addition to the touch panel 1431, the input unit 1430 may also include other input devices 1432. In particular, other input devices 1432 may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control keys, switch keys, etc.), a trackball, a mouse, a joystick, and the like.
The display unit 1440 may be used to display information input by or provided to the user and various menus of the mobile phone. The Display unit 1440 may include a Display panel 1441, and optionally, the Display panel 1441 may be configured in the form of a Liquid Crystal Display (LCD), an Organic Light-Emitting Diode (OLED), or the like. Further, touch panel 1431 can overlay display panel 1441, and when touch panel 1431 detects a touch operation on or near touch panel 1431, it can transmit to processor 1480 to determine the type of touch event, and then processor 1480 can provide a corresponding visual output on display panel 1441 according to the type of touch event. Although in fig. 12, the touch panel 1431 and the display panel 1441 are two independent components to implement the input and output functions of the mobile phone, in some embodiments, the touch panel 1431 and the display panel 1441 may be integrated to implement the input and output functions of the mobile phone.
The handset may also include at least one sensor 1450, such as light sensors, motion sensors, and other sensors. Specifically, the light sensor may include an ambient light sensor that adjusts the brightness of the display panel 1441 according to the brightness of ambient light, and a proximity sensor that turns off the display panel 1441 and/or the backlight when the mobile phone is moved to the ear. As one of the motion sensors, the accelerometer sensor can detect the magnitude of acceleration in each direction (generally, three axes), can detect the magnitude and direction of gravity when stationary, and can be used for applications of recognizing the posture of a mobile phone (such as horizontal and vertical screen switching, related games, magnetometer posture calibration), vibration recognition related functions (such as pedometer and tapping), and the like; as for other sensors such as a gyroscope, a barometer, a hygrometer, a thermometer, and an infrared sensor, which can be configured on the mobile phone, further description is omitted here.
Audio circuitry 1460, speaker 1461, microphone 1462 may provide an audio interface between a user and a cell phone. The audio circuit 1460 can transmit the received electrical signal converted from the audio data to the loudspeaker 1461, and the electrical signal is converted into a sound signal by the loudspeaker 1461 and output; on the other hand, the microphone 1462 converts collected sound signals into electrical signals, which are received by the audio circuit 1460 and converted into audio data, which are then processed by the audio data output processor 1480, and then passed through the RF circuit 1410 for transmission to, for example, another cellular phone, or for output to the memory 1420 for further processing.
WiFi belongs to short-distance wireless transmission technology, and the mobile phone can help a user to receive and send e-mails, browse webpages, access streaming media and the like through a WiFi module 1470, and provides wireless broadband internet access for the user. Although fig. 12 shows the WiFi module 1470, it is understood that it does not belong to the essential constitution of the handset and can be omitted entirely as needed within the scope not changing the essence of the invention.
The processor 1480, which is the control center of the mobile phone, connects various parts of the entire mobile phone by using various interfaces and lines, and performs various functions of the mobile phone and processes data by operating or executing software programs and/or modules stored in the memory 1420 and calling data stored in the memory 1420, thereby integrally monitoring the mobile phone. Alternatively, the processor 1480 may include one or more processing units; preferably, the processor 1480 may integrate an application processor, which handles primarily operating systems, user interfaces, and applications, among others, with a modem processor, which handles primarily wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 1480.
The handset also includes a power supply 1490 (e.g., a battery) for powering the various components, which may preferably be logically coupled to the processor 1480 via a power management system to provide management of charging, discharging, and power consumption via the power management system.
Although not shown, the mobile phone may further include a camera, a bluetooth module, etc., which are not described herein.
In this embodiment, the processor 1480 included in the terminal device also has the following functions:
acquiring user identifications corresponding to original features sent by different nodes;
determining the same user identification set according to the user identification, and sending the same user identification set to the different nodes;
if the node is in the first feature processing mode, receiving a first sub-feature obtained by each node after the ith round of feature processing aiming at the feature to be processed, aligning the first sub-feature, generating a first synchronization sub-feature aiming at the (i + 1) th round of feature processing, and sending the first synchronization sub-feature to different nodes, wherein the feature to be processed is the feature of each node determined from the original features and corresponding to all users in the same user identification set.
The feature Processing device provided in this embodiment of the present application may be a server, please refer to fig. 13, fig. 13 is a structural diagram of the server 1500 provided in this embodiment of the present application, and the server 1500 may generate a relatively large difference due to different configurations or performances, and may include one or more Central Processing Units (CPUs) 1522 (e.g., one or more processors) and a memory 1532, and one or more storage media 1530 (e.g., one or more mass storage devices) for storing an application program 1542 or data 1544. Memory 1532 and storage media 1530 may be, among other things, transient or persistent storage. The program stored on the storage medium 1530 may include one or more modules (not shown), each of which may include a series of instruction operations for the server. Still further, a central processor 1522 may be provided in communication with the storage medium 1530, executing a series of instruction operations in the storage medium 1530 on the server 1500.
The server 1500 may also include one or more power supplies 1526, one or more wired or wireless network interfaces 1550, one or more input-output interfaces 1558, and/or one or more operating systems 1541, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.
The steps performed by the server in the above embodiment may be based on the server structure shown in fig. 13.
The CPU1522 is configured to execute the following steps:
acquiring user identifications corresponding to original features sent by different nodes;
determining the same user identification set according to the user identification, and sending the same user identification set to the different nodes;
if the node is in the first feature processing mode, receiving a first sub-feature obtained by each node after the ith round of feature processing aiming at the feature to be processed, aligning the first sub-feature, generating a first synchronization sub-feature aiming at the (i + 1) th round of feature processing, and sending the first synchronization sub-feature to different nodes, wherein the feature to be processed is the feature of each node determined from the original features and corresponding to all users in the same user identification set.
The terms "first," "second," "third," "fourth," and the like in the description of the application and the above-described figures, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
It should be understood that in the present application, "at least one" means one or more, "a plurality" means two or more. "and/or" for describing an association relationship of associated objects, indicating that there may be three relationships, e.g., "a and/or B" may indicate: only A, only B and both A and B are present, wherein A and B may be singular or plural. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. "at least one of the following" or similar expressions refer to any combination of these items, including any combination of single item(s) or plural items. For example, at least one (one) of a, b, or c, may represent: a, b, c, "a and b", "a and c", "b and c", or "a and b and c", wherein a, b, c may be single or plural.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.
Those of ordinary skill in the art will understand that: all or part of the steps for realizing the method embodiments can be completed by hardware related to program instructions, the program can be stored in a computer readable storage medium, and the program executes the steps comprising the method embodiments when executed; and the aforementioned storage medium may be at least one of the following media: various media that can store program codes, such as read-only memory (ROM), RAM, magnetic disk, or optical disk.
It should be noted that, in the present specification, all the embodiments are described in a progressive manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus and system embodiments, since they are substantially similar to the method embodiments, they are described in a relatively simple manner, and reference may be made to some of the descriptions of the method embodiments for related points. The above-described embodiments of the apparatus and system are merely illustrative, and the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
The above description is only one specific embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present application should be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (12)

1. A feature processing method applied to a data processing apparatus, the method comprising:
acquiring user identifications corresponding to original features sent by different nodes;
determining the same user identification set according to the user identification, and sending the same user identification set to the different nodes;
if the node is in the first feature processing mode, receiving a first sub-feature obtained by each node after the ith round of feature processing aiming at the feature to be processed, aligning the first sub-feature, generating a first synchronization sub-feature aiming at the (i + 1) th round of feature processing, and sending the first synchronization sub-feature to different nodes, wherein the feature to be processed is the feature of each node determined from the original features and corresponding to all users in the same user identification set.
2. The method of claim 1, further comprising:
if the node is in the second feature processing mode, receiving a first sub-feature obtained by the node after the ith round of feature processing aiming at the feature to be processed, generating a first synchronization sub-feature aiming at the (i + 1) th round of feature processing according to the first reference sub-feature and the first sub-feature during the ith round of feature processing, and sending the first synchronization sub-feature to the node.
3. The method according to claim 2, wherein the generating a first synchronization sub-feature for the i +1 th round of feature processing according to the first reference sub-feature and the first sub-feature in the i-th round of feature processing comprises:
setting a reference weight corresponding to the first reference sub-feature and a first weight corresponding to the first sub-feature during the ith round of feature processing, wherein the reference weight set in the ith round is higher than the reference weight set in the (i-1) th round;
generating the first synchronization sub-feature according to the first reference sub-feature and the corresponding reference weight, and the first sub-feature and the corresponding first weight;
and updating the first reference sub-feature according to the first synchronization sub-feature, and taking the updated first reference sub-feature as the first reference sub-feature of the (i + 1) th round.
4. The method of claim 1, wherein if the feature to be processed is a matrix, the first sub-feature is a sub-matrix obtained by performing a non-negative matrix decomposition on the feature to be processed, and the method further comprises:
sending first parameters and/or second parameters aiming at the i +1 th round of feature processing to each node, wherein the first parameters are used for generating a first random matrix, and the second parameters are used for generating a second random matrix; the number of rows of the first random matrix and the second random matrix is smaller than the number of rows of the features to be processed, and the number of columns of the first random matrix and the second random matrix is smaller than the number of columns of the features to be processed.
5. The method according to claim 2 or 3, wherein if the feature to be processed is a matrix, the first sub-feature is a sub-matrix obtained by performing non-negative feature decomposition on the feature to be processed, and the method further comprises:
sending a second parameter aiming at the i +1 th round of feature processing to the node, wherein the second parameter is used for generating a second random matrix;
and the row number of the second random matrix is less than the row number of the features to be processed, or the column number of the second random matrix is less than the column number of the features to be processed.
6. A feature processing method applied to a node, the method comprising:
receiving the same user identification set sent by the data processing equipment, and determining the characteristics to be processed from the original characteristics according to the same user identification set; the features to be processed correspond to users in the same user identification set;
sending the first sub-characteristics obtained after the ith round of characteristic processing to the data processing equipment;
receiving a first synchronous sub-feature aiming at the i +1 th round of feature processing sent by the data processing equipment, and performing the feature processing process of the i +1 th round;
in the feature processing process of the (i + 1) th round, determining a second sub-feature according to the feature to be processed and the first synchronous sub-feature; and determining a first sub-feature according to the feature to be processed and the second sub-feature.
7. The method of claim 6, wherein if the feature to be processed is a matrix, the first sub-feature and the second sub-feature are sub-matrices obtained by performing non-negative matrix decomposition on the feature to be processed, and the method further comprises:
if a first parameter aiming at the (i + 1) th round of feature processing sent by the data processing equipment is received, generating a first random matrix according to the first parameter, wherein the row number of the first random matrix is less than the row number of the features to be processed, or the column number of the first random matrix is less than the column number of the features to be processed;
then, the determining a second sub-feature according to the feature to be processed and the first synchronization sub-feature includes: and determining a second sub-feature according to the feature to be processed, the first synchronous sub-feature and the first random matrix.
8. The method according to claim 6 or 7, wherein if the feature to be processed is a matrix, the first sub-feature and the second sub-feature are sub-matrices obtained by performing non-negative matrix decomposition on the feature to be processed, and the method further comprises:
if a second parameter aiming at the (i + 1) th round of feature processing sent by the data processing equipment is received, generating a second random matrix according to the second parameter, wherein the row number of the second random matrix is smaller than the row number of the features to be processed, or the column number of the second random matrix is smaller than the column number of the features to be processed;
then, the determining a first sub-feature according to the feature to be processed and the second sub-feature includes: and determining a first sub-feature according to the feature to be processed, the second sub-feature and the second random matrix.
9. An apparatus for feature processing, the apparatus comprising:
the acquiring unit is used for acquiring user identifications corresponding to original features sent by different nodes;
the first determining unit is used for determining the same user identifier set according to the user identifiers and sending the same user identifier set to the different nodes;
a first generating unit, configured to align first sub-features obtained after an ith round of feature processing for a feature to be processed by each node is received if the node is in a first feature processing mode, and generate a first synchronous sub-feature for an (i + 1) th round of feature processing,
and the first sending unit is used for sending the first synchronization sub-feature to different nodes, and the feature to be processed is the feature which is determined by each node from the original feature and corresponds to all users in the same user identification set.
10. An apparatus for feature processing, the apparatus comprising:
the receiving unit is used for receiving the same user identification set sent by the data processing equipment and determining the characteristics to be processed from the original characteristics according to the same user identification set; the features to be processed correspond to users in the same user identification set;
the second sending unit is used for sending the first sub-characteristics obtained after the ith round of characteristic processing to the data processing equipment;
the receiving unit is further configured to receive a first synchronization sub-feature, which is sent by the data processing device and is used for the (i + 1) th round of feature processing, and perform the feature processing process of the (i + 1) th round;
a second determining unit, configured to determine a second sub-feature according to the feature to be processed and the first synchronization sub-feature in the feature processing process of the (i + 1) th round; and determining a first sub-feature according to the feature to be processed and the second sub-feature.
11. An apparatus for feature processing, the apparatus comprising a processor and a memory:
the memory is used for storing program codes and transmitting the program codes to the processor;
the processor is configured to perform the feature processing method of claims 1-8 according to instructions in the program code.
12. A computer-readable storage medium characterized in that the computer-readable storage medium stores a program code for executing the feature processing method of claims 1 to 8.
CN201911025094.7A 2019-10-25 2019-10-25 Feature processing method and related equipment Active CN110782985B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911025094.7A CN110782985B (en) 2019-10-25 2019-10-25 Feature processing method and related equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911025094.7A CN110782985B (en) 2019-10-25 2019-10-25 Feature processing method and related equipment

Publications (2)

Publication Number Publication Date
CN110782985A true CN110782985A (en) 2020-02-11
CN110782985B CN110782985B (en) 2021-08-17

Family

ID=69386663

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911025094.7A Active CN110782985B (en) 2019-10-25 2019-10-25 Feature processing method and related equipment

Country Status (1)

Country Link
CN (1) CN110782985B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103389966A (en) * 2012-05-09 2013-11-13 阿里巴巴集团控股有限公司 Massive data processing, searching and recommendation methods and devices
CN107040610A (en) * 2017-05-27 2017-08-11 广东欧珀移动通信有限公司 Method of data synchronization, device, storage medium, terminal and server
EP3223170A1 (en) * 2014-12-23 2017-09-27 Huawei Technologies Co. Ltd. Data processing method and device in data modeling
CN108681426A (en) * 2018-05-25 2018-10-19 第四范式(北京)技术有限公司 Method and system for executing characteristic processing for data
US20180341691A1 (en) * 2015-11-24 2018-11-29 T2 Data Ab Data synchronization in a distributed data storage system
CN109785034A (en) * 2018-11-13 2019-05-21 北京码牛科技有限公司 User's portrait generation method, device, electronic equipment and computer-readable medium
CN110008017A (en) * 2018-12-06 2019-07-12 阿里巴巴集团控股有限公司 A kind of distributed processing system(DPS) and method, a kind of calculating equipment and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103389966A (en) * 2012-05-09 2013-11-13 阿里巴巴集团控股有限公司 Massive data processing, searching and recommendation methods and devices
EP3223170A1 (en) * 2014-12-23 2017-09-27 Huawei Technologies Co. Ltd. Data processing method and device in data modeling
US20180341691A1 (en) * 2015-11-24 2018-11-29 T2 Data Ab Data synchronization in a distributed data storage system
CN107040610A (en) * 2017-05-27 2017-08-11 广东欧珀移动通信有限公司 Method of data synchronization, device, storage medium, terminal and server
CN108681426A (en) * 2018-05-25 2018-10-19 第四范式(北京)技术有限公司 Method and system for executing characteristic processing for data
CN109785034A (en) * 2018-11-13 2019-05-21 北京码牛科技有限公司 User's portrait generation method, device, electronic equipment and computer-readable medium
CN110008017A (en) * 2018-12-06 2019-07-12 阿里巴巴集团控股有限公司 A kind of distributed processing system(DPS) and method, a kind of calculating equipment and storage medium

Also Published As

Publication number Publication date
CN110782985B (en) 2021-08-17

Similar Documents

Publication Publication Date Title
EP3396516B1 (en) Mobile terminal, method and device for displaying fingerprint recognition region
CN106919918B (en) Face tracking method and device
CN106446841B (en) A kind of fingerprint template matching order update method and terminal
EP3291618B1 (en) Method for recognizing location and electronic device implementing the same
CN108052820B (en) Unlocking control method, terminal equipment and related product
WO2016078504A1 (en) Identity authentication method and device
JP6553747B2 (en) Method and apparatus for training human face model matrix, and storage medium
US20160036810A1 (en) Electronic device and method of transceiving data
CN107103074B (en) Processing method of shared information and mobile terminal
CN112615852A (en) Data processing method, related device and computer program product
CN111090877B (en) Data generation and acquisition methods, corresponding devices and storage medium
CN105245432B (en) Unread message counting method and device and terminal
CN109766705B (en) Circuit-based data verification method and device and electronic equipment
CN114547082A (en) Data aggregation method, related device, equipment and storage medium
CN104360800A (en) Adjustment method of unlocking mode
CN106294087B (en) Statistical method and device for operation frequency of business execution operation
CN110782985B (en) Feature processing method and related equipment
US20150181430A1 (en) Systems and methods for communication using a body area network
CN106411681B (en) Information processing method, initiating device, server and participating device
CN115270163B (en) Data processing method, related device and storage medium
CN112235082A (en) Communication information transmission method, device, equipment and storage medium
CN107506129A (en) For handling the method and its electronic installation of user's input
CN115549889A (en) Decryption method, related device and storage medium
CN115589281A (en) Decryption method, related device and storage medium
WO2019140567A1 (en) Big data analysis method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40022346

Country of ref document: HK

SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant