CN116362101A

CN116362101A - Data processing method based on joint learning, data model generation method and device

Info

Publication number: CN116362101A
Application number: CN202111552665.XA
Authority: CN
Inventors: 丁启杰
Original assignee: Xinzhi I Lai Network Technology Co ltd
Current assignee: Xinzhi I Lai Network Technology Co ltd
Priority date: 2021-12-17
Filing date: 2021-12-17
Publication date: 2023-06-30

Abstract

The disclosure relates to the technical field of data processing, and provides a data processing method based on joint learning, a data model generation method and a data model generation device. The method comprises the following steps: acquiring an original comparison set based on a joint learning architecture; performing normalization processing on each original comparison data in the original comparison set to generate at least one normalization data, thereby obtaining a target comparison set; generating a set of prediction results based on the target comparison set and the reference data model in response to receiving the reference data model; generating target accuracy based on a preset comparison strategy, a predicted result set and a real result set; and generating a target approximation result based on the target accuracy and the target recommendation strategy. According to the embodiment of the disclosure, through the steps, the similarity between the data sets with different sources can be obtained, so that the data selection of joint learning is greatly facilitated.

Description

Data processing method based on joint learning, data model generation method and device

Technical Field

The disclosure relates to the technical field of data processing, in particular to a data processing method based on joint learning, a data model generation method and a data model generation device.

Background

With the rapid development of the machine learning field, the joint learning mode is increasingly used because of the limitation of current data or the sensitivity of data, and the available data is very small. Because of the severe heterogeneous problem of data sets from different sources, how to select a suitable data set becomes a great difficulty in joint learning.

Disclosure of Invention

In view of this, the embodiments of the present disclosure provide a data processing method, a data model generating method and a device based on joint learning, so as to solve the problem that in the prior art, a suitable data set cannot be selected for joint learning.

In a first aspect of an embodiment of the present disclosure, a data processing method based on joint learning is provided, where the method includes: acquiring an original comparison set based on a joint learning architecture, wherein the original comparison set comprises at least one original comparison data; performing normalization processing on each original comparison data in the original comparison set to generate at least one normalization data, thereby obtaining a target comparison set; generating a set of prediction results based on the target comparison set and the reference data model in response to receiving the reference data model; generating target accuracy based on a preset comparison strategy, a predicted result set and a real result set; and generating a target approximation result based on the target accuracy and the target recommendation strategy.

In a second aspect of the embodiments of the present disclosure, a method for generating a data model based on joint learning is provided, where the method includes: acquiring a test data set based on a joint learning architecture; training the initial data model through a test data set to obtain a reference data model; the baseline data model is sent to at least one participant in the joint learning architecture.

A third aspect of the embodiments of the present disclosure provides a data processing apparatus based on joint learning, the apparatus comprising: the system comprises an original comparison set acquisition module, a comparison module and a comparison module, wherein the original comparison set acquisition module is configured to acquire an original comparison set based on a joint learning architecture, and the original comparison set comprises at least one original comparison data; the normalization processing module is configured to normalize each piece of original comparison data in the original comparison set to generate at least one piece of normalization data, so as to obtain a target comparison set; a prediction result set generation module configured to generate a prediction result set based on the target comparison set and the reference data model in response to receiving the reference data model; the target accuracy rate generation module is configured to generate target accuracy rate based on a preset comparison strategy, a predicted result set and a real result set; and the approximation result generation module is configured to generate a target approximation result based on the target accuracy and the target recommendation strategy.

In a fourth aspect of the embodiments of the present disclosure, there is provided a data model generating apparatus based on joint learning, the apparatus including: a test data set acquisition module configured to acquire a test data set based on a joint learning architecture; the training module is configured to train the initial data model through the test data set to obtain a reference data model; a transmitting module configured to transmit the baseline data model to at least one participant in the joint learning architecture.

In a fifth aspect of the disclosed embodiments, a computer device is provided, comprising a memory, a processor and a computer program stored in the memory and executable on the processor, which processor implements the steps of a data processing method based on joint learning when executing the computer program.

In a sixth aspect of the disclosed embodiments, a computer readable storage medium is provided, which stores a computer program which, when executed by a processor, implements steps of a data processing method based on joint learning.

Compared with the prior art, the beneficial effects of the embodiment of the disclosure at least comprise: the method comprises the steps of firstly, obtaining an original comparison set based on a joint learning architecture; performing normalization processing on each original comparison data in the original comparison set to generate at least one normalization data, thereby obtaining a target comparison set; generating a set of prediction results based on the target comparison set and the reference data model in response to receiving the reference data model; generating target accuracy based on a preset comparison strategy, a predicted result set and a real result set; based on the target accuracy and the target recommendation strategy, a target approximation result is generated, so that the similarity between data sets of different sources can be obtained, and the data selection of joint learning is greatly facilitated.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings that are required for the embodiments or the description of the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present disclosure, and other drawings may be obtained according to these drawings without inventive effort for a person of ordinary skill in the art.

FIG. 1 is a schematic diagram of a joint learning architecture according to an embodiment of the present disclosure;

FIG. 2 is a flow chart of a data processing method based on joint learning provided by an embodiment of the present disclosure;

FIG. 3 is a flow chart of another data processing method based on joint learning provided by an embodiment of the present disclosure;

FIG. 4 is a flow chart of a method for generating a data model based on joint learning provided by an embodiment of the present disclosure;

FIG. 5 is a block diagram of a joint learning based data processing apparatus provided by an embodiment of the present disclosure;

FIG. 6 is a block diagram of a joint learning-based data model generation apparatus provided by an embodiment of the present disclosure;

fig. 7 is a schematic diagram of a computer device provided by an embodiment of the present disclosure.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system configurations, techniques, etc. in order to provide a thorough understanding of the disclosed embodiments. However, it will be apparent to one skilled in the art that the present disclosure may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present disclosure with unnecessary detail.

The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

The joint learning refers to comprehensively utilizing a plurality of AI (Artificial Intelligence ) technologies on the premise of ensuring data safety and user privacy, jointly excavating data value by combining multiparty cooperation, and promoting new intelligent business states and modes based on joint modeling. The joint learning has at least the following characteristics:

(1) The participating nodes control the weak centralized joint training mode of the own data, so that the data privacy safety in the co-creation intelligent process is ensured.

(2) Under different application scenes, a plurality of model aggregation optimization strategies are established by utilizing screening and/or combination of an AI algorithm and privacy protection calculation so as to obtain a high-level and high-quality model.

(3) On the premise of ensuring data safety and user privacy, a method for improving the efficiency of the joint learning engine is obtained based on a plurality of model aggregation optimization strategies, wherein the efficiency method can be used for improving the overall efficiency of the joint learning engine by solving the problems of information interaction, intelligent perception, exception handling mechanisms and the like under a large-scale cross-domain network with parallel computing architecture.

(4) The requirements of multiparty users in all scenes are acquired, the real contribution degree of all joint participants is determined and reasonably evaluated through a mutual trust mechanism, and distribution excitation is carried out.

Based on the mode, AI technical ecology based on joint learning can be established, the industry data value is fully exerted, and the scene of the vertical field is promoted to fall to the ground.

The present disclosure will be described in detail below with reference to the accompanying drawings.

Fig. 1 is a schematic diagram of a joint learning architecture according to an embodiment of the present disclosure. As shown in fig. 1, the architecture of joint learning may include a server (central node) 101, as well as

participants

102, 103, and 104.

In the joint learning process, a basic model may be established by the server 101, and the server 101 transmits the model to the

participants

102, 103, and 104 with which a communication connection is established. The basic model may also be uploaded to the server 101 after any party has established, and the server 101 sends the model to the other parties with whom it has established a communication connection. The

participants

102, 103 and 104 construct a model according to the downloaded basic structure and model parameters, perform model training using local data, obtain updated model parameters, and encrypt and upload the updated model parameters to the server 101. Server 101 aggregates the model parameters sent by

participants

102, 103, and 104 to obtain global model parameters, and transmits the global model parameters back to

participants

102, 103, and 104. Participant 102, participant 103 and participant 104 iterate the respective models according to the received global model parameters until the models eventually converge, thereby enabling training of the models. In the joint learning process, the data uploaded by the

participants

102, 103 and 104 are model parameters, local data is not uploaded to the server 101, and all the participants can share final model parameters, so that common modeling can be realized on the basis of ensuring data privacy. It should be noted that the number of participants is not limited to three as described above, but may be set as needed, and the embodiment of the present disclosure is not limited thereto.

Fig. 2 is a flowchart of a data processing method based on joint learning according to an embodiment of the present disclosure. The joint learning based data processing method of fig. 2 may be performed by the participants of fig. 1. As shown in fig. 2, the data processing method based on joint learning includes:

s201, acquiring an original comparison set based on a joint learning architecture, wherein the original comparison set comprises at least one original comparison data.

An original contrast set may refer to a data set consisting of at least one original contrast data. The raw contrast data may refer to basic data acquired by the party 1, such as "temperature: 34 "or" gas usage: 153.4", etc., the data obtained from different application scenarios are different, and are not particularly limited herein.

S202, normalizing each piece of original comparison data in the original comparison set to generate at least one normalization data, and obtaining a target comparison set.

Normalization may refer to transforming a dimensionalized expression into a dimensionless expression, which becomes a scalar. Dimension may refer to a fundamental property of a physical quantity, such as length, time, mass, velocity, acceleration, force, kinetic energy, angle, ratio of two lengths, ratio of two times, ratio of two forces, ratio of two energies, etc.

By way of example, a comparison of the data before and after normalization may be made with reference to the following table,

data before normalization	Data after normalization
		1306332	-0.255625448
328536	-0.269744679
		656880	-0.265003439
66960	-0.273521798
		82516	-0.273297171
145200	-0.272392024
		70943	-0.273464284

S203, generating a prediction result set based on the target comparison set and the reference data model in response to receiving the reference data model.

The reference data model may refer to a mathematical model composed of mathematical formulas for which model parameters have been determined after training. It should be noted that the comparative data model is a classification data model.

In some embodiments, the baseline data model may be a logistic regression (Logistic Regression, LR) classification model. The algorithm of the LR model is very simple and efficient, so that the operation efficiency can be greatly improved by using the LR model.

S204, generating target accuracy based on a preset comparison strategy, the predicted result set and the real result set.

The real result set may refer to a data set composed of real result data. The true result data may refer to true classification data for each of the raw contrast data in the raw contrast set. A comparison policy may refer to a step or method of comparing data in a predicted result set with data in a real result set. The target accuracy may refer to a ratio value obtained by dividing the number of correctly identified samples by the number of all samples.

S205, generating a target approximation result based on the target accuracy and the target recommendation strategy.

The target recommendation policy may refer to a method or step of performing different recommendations according to different target accuracy rates. The target approximation result may refer to a result generated based on the target accuracy.

According to the technical scheme provided by the embodiment of the disclosure, an original comparison set is obtained; performing normalization processing on each original comparison data in the original comparison set to generate at least one normalization data, thereby obtaining a target comparison set; generating a set of prediction results based on the target comparison set and the reference data model in response to receiving the reference data model; generating target accuracy based on a preset comparison strategy, a predicted result set and a real result set; based on the target accuracy and the target recommendation strategy, a target approximation result is generated, so that the similarity between data sets of different sources can be obtained, and the data selection of joint learning is greatly facilitated.

In some embodiments, the comparison strategy comprises: obtaining a real index, a true negative index, a false positive index and a false negative index, wherein the real index, the true negative index, the false positive index and the false negative index are integers with initial values of zero; step one: obtaining one of the prediction result data which is not marked and compared in the prediction result set, and obtaining intermediate prediction result data; step two: acquiring real result data corresponding to the intermediate predicted result data in the real result set to obtain intermediate real result data; step three: when the intermediate predicted result data is the same as the intermediate real result data and both are positive values, adding one to the real index; step four: when the intermediate predicted result data is the same as the intermediate real result data and both are negative values, adding one to the true negative index; step five: when the intermediate predicted result data is different from the intermediate real result data and the intermediate predicted result data is a positive value, adding one to the false positive index; step six: when the intermediate predicted result data is different from the intermediate real result data and the intermediate predicted result data is a negative value, adding one to the false negative index; step seven: marking the prediction result data corresponding to the intermediate prediction result data in the prediction result set as compared; repeating the steps one to seven until each predicted result data in the predicted result set is marked as compared; and generating target accuracy based on a preset calculation strategy, a real index, a true index, a false positive index and a false negative index.

The true index may refer to a count value when the predicted result data is the same as the true result data, and both are positive values. The true negative index may refer to a count value when the predicted result data is the same as the true result data, and both are negative. The false positive index may refer to a count value when the predicted result data is different from the real result data and the predicted result data is a positive value. The false negative index may refer to a count value when the predicted result data is different from the real result data and the predicted result data is a negative value.

In some embodiments, the computing policy includes: substituting the real index, the true negative index, the false positive index and the false negative index into an accuracy mathematical formula to generate a target accuracy; the accuracy mathematical formula is:

F＝(TP+TN)/(TP+TN+FP+FN)

wherein F refers to accuracy, TP refers to a real index, TN refers to a real negative index, FP refers to a false positive index, and FN refers to a false negative index.

In some embodiments, the target recommendation policy includes: obtaining similarity marks of a similar bottom threshold value, a generalization top threshold value, a deviation threshold value and a comparison model mark; when the target accuracy is greater than the similar bottom threshold, marking the similarity identification as similar data; when the target accuracy rate is greater than the generalization bottom threshold value and less than the generalization top threshold value, marking the similarity identification as generalization data; when the target accuracy is smaller than the deviation threshold, marking the similarity mark as deviation data; and generating an approximation result based on the approximation identification of the comparison model identification.

The similar bottom threshold may refer to a limit value of the target accuracy, and when the target accuracy is greater than zero and less than the similar bottom threshold, the data set corresponding to the accuracy may be considered as similar data, i.e., the data set may be trained in combination with the data set of the reference server, thereby improving the accuracy of the model. The generalization bottom threshold may refer to 2 limit values of the target accuracy, and when the target accuracy is greater than the generalization bottom threshold and less than the generalization top threshold, the data set corresponding to the accuracy may be considered as generalization data, i.e., the data set may be trained in combination with the data set of the reference server, thereby improving the generalization capability of the model. Generalization capability may refer to the ability of a machine learning algorithm to adapt to fresh samples. When the target accuracy is less than the deviation threshold, the data set corresponding to the accuracy may be considered as junk data or irrelevant data.

As an example, the similar bottom threshold may be 90%, the generalized bottom threshold may be 60%, the generalized top threshold 80%, and the bias threshold may be 20%. If the target accuracy is greater than 90%, the data is similar. If the target is between 60% and 80% accurate, the data is generalization data. If the target accuracy is less than 20%, the target accuracy is garbage data or irrelevant data.

Fig. 3 is a flowchart of a data processing method for daily air consumption based on joint learning according to an embodiment of the present disclosure. The data processing method of the daily air amount based on joint learning of fig. 3 may be performed by the participants of fig. 1. As shown in fig. 3, the data processing method based on joint learning of the air consumption includes: based on the fact that in the joint learning architecture,

s301, acquiring an original daily gas amount data set, wherein the original daily gas amount data set comprises at least one original daily gas amount data.

The raw daily gas volume data set may be a data set formed by daily gas volume data of one enterprise acquired by the participant 1 in one time period. As an example, the raw daily air consumption data set may be a set of daily air consumption data of 1 month to 12 months in 2020. The raw daily air amount data can be 0 or a positive number. As an example, one raw daily gas amount data of a certain enterprise may be "0.5", "198.571" or "154", etc., and is set according to the needs of the enterprise, which is not particularly limited herein.

S302, normalizing each piece of original daily air consumption data in the original daily air consumption data set to generate at least one normalized data, and obtaining a target comparison set.

As an example, the data before normalization and after normalization can be shown with reference to the following table:

data before normalization	Data after normalization
		35.154	-0.255625448
16.454	-0.169744679
		48.14	-0.365003439
0	0
		38.411	-0.313297171
32.144	-0.232392024
		28.77	-0.203464284

S303, generating a prediction result set based on the target comparison set and the reference daily air quantity model in response to receiving the reference daily air quantity model.

The reference daily air quantity model may be a logistic regression (Logistic Regression, LR) classification model.

S304, a real index, a true negative index, a false positive index, a false negative index and a real result set are obtained, wherein the real index, the true negative index, the false positive index and the false negative index are integers with initial values of zero.

S305, obtaining one of the prediction result data which is not marked and compared in the prediction result set, and obtaining intermediate prediction result data.

S306, obtaining real result data corresponding to the intermediate predicted result data in the real result set, and obtaining intermediate real result data.

S307, when the intermediate predicted result data is the same as the intermediate real result data and both are positive values, adding one to the real index.

S308, when the intermediate predicted result data is the same as the intermediate real result data and both are negative, adding one to the true negative index.

S309, when the intermediate predicted result data is not the same as the intermediate real result data and the intermediate predicted result data is a positive value, adding one to the false positive index.

And S310, when the intermediate predicted result data is different from the intermediate real result data and the intermediate predicted result data is a negative value, adding one to the false negative index.

S311, marking the prediction result data corresponding to the intermediate prediction result data in the prediction result set as compared.

S312, S305 to S311 are repeatedly performed until each of the prediction result data in the prediction result set is marked as compared.

S313, generating the target accuracy based on a preset calculation strategy, the real index, the true negative index, the false positive index and the false negative index.

The calculation strategy can be a calculation method or a calculation step based on the following accuracy mathematical formula:

the accuracy mathematical formula is:

F＝(TP+TN)/(TP+TN+FP+FN)

As an example, the true index may be 211, the true negative index may be 198, the false positive index may be 24, and the false negative index may be 18.

The accuracy f1= (211+198)/(211+198+24+18) =90.69%.

And S314, generating an approximation result of the original daily gas amount data set and the data set corresponding to the reference daily gas amount model based on the target accuracy rate and the target recommendation strategy.

As an example, if the target accuracy is 90.69%, the approximation result of the original daily air amount data set and the data set corresponding to the reference daily air amount model is similar data.

Fig. 4 is a flowchart of a data model generating method provided in an embodiment of the present disclosure. As shown in fig. 4, the data processing method based on joint learning includes:

s401, acquiring a test data set based on a joint learning architecture.

A test data set may refer to a data set consisting of test data. Test data may refer to data obtained for training a model.

S402, training the initial data model through the test data set to obtain a reference data model.

The initial data model may be an existing or self-set mathematical formula, and parameters of the mathematical formula are initial default values. The parameters of the mathematical formula may be constants, arrays, vectors, etc. Training a model may refer to the process of determining model parameters through a series of steps or methods based on existing data. The reference data model may refer to a mathematical formula in which model parameters have been determined after training.

S403, the reference data model is sent to at least one participant in the joint learning architecture.

And at least one participant transmitting the reference data model can avoid directly transmitting the data set corresponding to the reference data model, can protect the data set acquired by the local server, and increases the safety of data and a system.

Any combination of the above optional solutions may be adopted to form an optional embodiment of the present application, which is not described herein in detail.

The following are device embodiments of the present disclosure that may be used to perform method embodiments of the present disclosure. For details not disclosed in the embodiments of the apparatus of the present disclosure, please refer to the embodiments of the method of the present disclosure.

Fig. 5 is a schematic diagram of a data processing apparatus based on joint learning provided in an embodiment of the present disclosure. As shown in fig. 5, the data processing apparatus based on joint learning includes:

the raw contrast set acquisition module 501 is configured to acquire a raw contrast set based on a joint learning architecture, wherein the raw contrast set includes at least one raw contrast data.

The normalization processing module 502 is configured to perform normalization processing on each piece of original comparison data in the original comparison set, and generate at least one piece of normalization data to obtain a target comparison set.

The prediction result set generation module 503 is configured to generate a prediction result set based on the target comparison set and the reference data model in response to receiving the reference data model.

The target accuracy rate generation module 504 is configured to generate a target accuracy rate based on a preset comparison policy, a predicted result set, and a real result set.

The approximation result generation module 505 is configured to generate a target approximation result based on the target accuracy and the target recommendation policy.

F＝(TP+TN)/(TP+TN+FP+FN)

Fig. 6 is a schematic diagram of a data model generating apparatus provided in an embodiment of the present disclosure. As shown in fig. 6, the data model generating apparatus based on joint learning includes:

the test data set acquisition module 601 is configured to acquire a test data set based on a joint learning architecture.

The training module 602 is configured to train the initial data model by the test data set to obtain a reference data model.

A transmitting module 603 configured to transmit the baseline data model to at least one participant in the joint learning architecture.

The reference data model is sent to at least one participant, so that the data set corresponding to the reference data model can be prevented from being directly sent out, the data set acquired by the local server can be protected, and the safety of data and a system is improved.

In some embodiments, the test data set acquisition module 601 of the data model generating device is further configured to: acquiring an original data set, wherein the original data set comprises at least one original data; performing normalization processing on each original data in the original data set to generate at least one normalization data, thereby obtaining a normalization data set; the normalized data set is determined as the test data set.

It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic of each process, and should not constitute any limitation on the implementation process of the embodiments of the disclosure.

Fig. 7 is a schematic diagram of a computer device 700 provided by an embodiment of the present disclosure. As shown in fig. 7, the computer device 700 of this embodiment includes: a processor 701, a memory 702 and a computer program 703 stored in the memory 702 and executable on the processor 701. The steps of the various method embodiments described above are implemented by the processor 701 when executing the computer program 703. Alternatively, the processor 701, when executing the computer program 703, performs the functions of the modules/units of the apparatus embodiments described above.

Illustratively, the computer program 703 may be partitioned into one or more modules/units, which are stored in the memory 702 and executed by the processor 701 to complete the present disclosure. One or more of the modules/units may be a series of computer program instruction segments capable of performing particular functions to describe the execution of the computer program 703 in the computer device 700.

The computer device 700 may be a desktop computer, a notebook computer, a palm top computer, a cloud server, or the like. The computer device 700 may include, but is not limited to, a processor 701 and a memory 702. It will be appreciated by those skilled in the art that fig. 7 is merely an example of a computer device 700 and is not intended to limit the computer device 700, and may include more or fewer components than shown, or may combine certain components, or different components, e.g., a computer device may also include an input-output device, a network access device, a bus, etc.

The processor 701 may be a central processing unit (Central Processing Unit, CPU) or other general purpose processor, digital signal processor (Digital Signal Processor, DSP), application specific integrated circuit (Application Specific Integrated Circuit, ASIC), field programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 702 may be an internal storage unit of the computer device 700, for example, a hard disk or a memory of the computer device 700. The memory 702 may also be an external storage device of the computer device 700, for example, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash Card (Flash Card) or the like, which are provided on the computer device 700. Further, the memory 702 may also include both internal storage units and external storage devices of the computer device 700. The memory 702 is used to store computer programs and other programs and data required by the computer device. The memory 702 may also be used to temporarily store data that has been output or is to be output.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working process of the units and modules in the above system may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.

In the embodiments provided in the present disclosure, it should be understood that the disclosed apparatus/computer device and method may be implemented in other manners. For example, the apparatus/computer device embodiments described above are merely illustrative, e.g., the division of modules or elements is merely a logical functional division, and there may be additional divisions of actual implementations, multiple elements or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection via interfaces, devices or units, which may be in electrical, mechanical or other forms.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present disclosure may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated modules/units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present disclosure may implement all or part of the flow of the method of the above-described embodiments, or may be implemented by a computer program to instruct related hardware, and the computer program may be stored in a computer readable storage medium, where the computer program, when executed by a processor, may implement the steps of the method embodiments described above. The computer program may comprise computer program code, which may be in source code form, object code form, executable file or in some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the content of the computer readable medium can be appropriately increased or decreased according to the requirements of the jurisdiction's jurisdiction and the patent practice, for example, in some jurisdictions, the computer readable medium does not include electrical carrier signals and telecommunication signals according to the jurisdiction and the patent practice.

The above embodiments are merely for illustrating the technical solution of the present disclosure, and are not limiting thereof; although the present disclosure has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the disclosure, and are intended to be included in the scope of the present disclosure.

Claims

1. A data processing method based on joint learning, the method comprising:

acquiring an original comparison set based on a joint learning architecture, wherein the original comparison set comprises at least one original comparison data;

performing normalization processing on each original comparison data in the original comparison set to generate at least one normalization data, thereby obtaining a target comparison set;

generating a set of prediction results based on the target comparison set and the reference data model in response to receiving the reference data model;

generating a target accuracy rate based on a preset comparison strategy, the predicted result set and the real result set;

and generating a target approximation result based on the target accuracy and the target recommendation strategy.

2. The method of claim 1, wherein the comparison strategy comprises:

obtaining a real index, a true negative index, a false positive index and a false negative index, wherein the real index, the true negative index, the false positive index and the false negative index are integers with initial values of zero;

obtaining one of the prediction result data which is not marked and compared in the prediction result set, and obtaining intermediate prediction result data;

acquiring real result data corresponding to the intermediate predicted result data in the real result set to obtain intermediate real result data;

when the intermediate predicted result data is the same as the intermediate real result data and both are positive values, adding one to the real index;

when the intermediate predicted result data is the same as the intermediate real result data and both are negative values, adding one to the true negative index;

when the intermediate predicted result data is different from the intermediate real result data and the intermediate predicted result data is a positive value, adding one to the false positive index;

when the intermediate predicted result data is different from the intermediate real result data and the intermediate predicted result data is a negative value, adding one to the false negative index;

marking the prediction result data corresponding to the intermediate prediction result data in the prediction result set as compared;

the above steps are repeated until each of the prediction result data in the prediction result set is marked as compared.

3. The method of claim 2, wherein the computing policy comprises:

substituting the real index, the true negative index, the false positive index and the false negative index into an accuracy rate mathematical formula to generate the target accuracy rate;

the accuracy mathematical formula is:

F＝(TP+TN)/(TP+TN+FP+FN)

4. A method according to any one of claims 1 to 3, wherein the target recommendation policy comprises:

obtaining a similar bottom threshold value, a generalization top threshold value, a deviation threshold value and an approximation identifier of the comparison model identifier;

marking the proximity mark as similar data when the target accuracy is greater than the similar bottom threshold;

marking the proximity identification as generalized data when the target accuracy is greater than the generalized bottom threshold and less than the generalized top threshold;

when the target accuracy rate is smaller than the deviation threshold value, marking the approximation degree mark as deviation data;

and generating an approximation result based on the approximation identification of the comparison model identification.

5. A data model generation method based on joint learning, the method comprising:

acquiring a test data set based on a joint learning architecture;

training the initial data model through the test data set to obtain a reference data model;

the baseline data model is sent to at least one participant in the joint learning architecture.

6. The method of claim 5, wherein the acquiring a test dataset comprises:

obtaining an original data set, wherein the original data set comprises at least one original data;

performing normalization processing on each original data in the original data set to generate at least one normalization data, and obtaining a normalization data set;

the normalized data set is determined as the test data set.

7. A data processing apparatus based on joint learning, the apparatus comprising:

the system comprises an original comparison set acquisition module, a comparison module and a comparison module, wherein the original comparison set acquisition module is configured to acquire an original comparison set based on a joint learning architecture, and the original comparison set comprises at least one original comparison data;

the normalization processing module is configured to normalize each piece of original comparison data in the original comparison set to generate at least one piece of normalization data so as to obtain a target comparison set;

a prediction result set generation module configured to generate a prediction result set based on the target comparison set and the reference data model in response to receiving the reference data model;

the target accuracy rate generation module is configured to generate target accuracy rate based on a preset comparison strategy, the prediction result set and the real result set;

and the approximation result generation module is configured to generate a target approximation result based on the target accuracy and the target recommendation strategy.

8. A data model generation apparatus based on joint learning, the apparatus comprising:

a test data set acquisition module configured to acquire a test data set based on a joint learning architecture;

the training module is configured to train the initial data model through the test data set to obtain a reference data model;

a transmitting module configured to transmit the baseline data model to at least one participant in the joint learning architecture.

9. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1 to 6 when the computer program is executed.

10. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the method according to any one of claims 1 to 6.