CN113487040A

CN113487040A - Attention mechanism-based joint learning method and device, computer equipment and computer readable storage medium

Info

Publication number: CN113487040A
Application number: CN202110778808.2A
Authority: CN
Inventors: 谢龙飞; 马国良
Original assignee: Ennew Digital Technology Co Ltd
Current assignee: Ennew Digital Technology Co Ltd
Priority date: 2021-07-09
Filing date: 2021-07-09
Publication date: 2021-10-08

Abstract

The invention provides a joint learning method and device based on an attention mechanism, computer equipment and a computer readable storage medium. The method comprises the following steps: receiving a plurality of participant models uploaded by participants; respectively determining a query parameter matrix and a key value matrix according to the participant model; determining the difference degree between each query parameter matrix and a plurality of key value matrixes; and locally updating the participant model by using the difference degree to obtain a new participant model. The problem that data are distributed unevenly due to the fact that data of all parties are split and the specific difference between the data is difficult to measure directly in the joint learning process is solved.

Description

Attention mechanism-based joint learning method and device, computer equipment and computer readable storage medium

Technical Field

The present disclosure relates to the field of artificial intelligence technologies, and in particular, to a joint learning method and apparatus based on an attention mechanism, a computer device, and a computer-readable storage medium.

Background

In a joint learning scenario, a situation that data distributions of different data sources (between participants) are inconsistent is often encountered, and the prior art performs weighted summation based on differences of data volumes of the participants, so that the differences between the data distributions are ignored. In joint learning, the data of each party are split from each other, and it is difficult to directly measure the specific difference between the data.

Disclosure of Invention

In view of this, the embodiments of the present disclosure provide a joint learning method and apparatus based on an attention mechanism, a computer device, and a computer readable storage medium, so as to solve the problem in the prior art that data distribution is inconsistent.

In a first aspect of the embodiments of the present disclosure, a joint learning method based on an attention mechanism is provided, including:

receiving a plurality of participant models uploaded by participants;

respectively determining a query parameter matrix and a key value matrix according to the participant model;

determining the difference degree between each query parameter matrix and a plurality of key value matrixes;

and locally updating the participant model by using the difference degree to obtain a new participant model.

In a second aspect of the disclosed embodiments, there is provided an attention-based joint learning apparatus, including:

the receiving module is used for receiving a plurality of participant models uploaded by participants;

the determining module is used for respectively determining a query parameter matrix and a key value matrix according to the participant model;

the calculation module is used for determining the difference degree between each inquiry parameter matrix and the plurality of key value matrixes;

and the updating module is used for locally updating the participant model by utilizing the difference degree so as to obtain a new participant model.

In a third aspect of the embodiments of the present disclosure, a computer device is provided, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor implements the steps of the above method when executing the computer program.

In a fourth aspect of the embodiments of the present disclosure, a computer-readable storage medium is provided, which stores a computer program, which when executed by a processor, implements the steps of the above-mentioned method.

Compared with the prior art, the embodiment of the disclosure has the following beneficial effects: receiving a plurality of participant models uploaded by a participant; respectively determining a query parameter matrix and a key value matrix according to the participant model; determining the difference degree between each query parameter matrix and a plurality of key value matrixes; and locally updating the participant model by using the difference degree to obtain a new participant model. The problem that data are distributed unevenly due to the fact that data of all parties are split and the specific difference between the data is difficult to measure directly in the joint learning process is solved.

Drawings

To more clearly illustrate the technical solutions in the embodiments of the present disclosure, the drawings needed for the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present disclosure, and other drawings can be obtained by those skilled in the art without inventive efforts.

FIG. 1 is a scenario diagram of an application scenario of an embodiment of the present disclosure;

FIG. 2 is a flowchart of a joint learning method based on an attention mechanism according to an embodiment of the present disclosure;

FIG. 3 is a block diagram of a joint learning apparatus based on an attention mechanism provided by an embodiment of the present disclosure;

fig. 4 is a schematic diagram of a computer device provided by an embodiment of the present disclosure.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the disclosed embodiments. However, it will be apparent to one skilled in the art that the present disclosure may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present disclosure with unnecessary detail.

An attention mechanism-based joint learning method and apparatus according to an embodiment of the present disclosure will be described in detail below with reference to the accompanying drawings.

Fig. 1 is a scene schematic diagram of an application scenario of an embodiment of the present disclosure. The application scenario may include

terminal devices

1, 2, and 3, server 4, and network 5.

The

terminal devices

1, 2, and 3 may be hardware or software. When the

terminal devices

1, 2 and 3 are hardware, they may be various electronic devices having a display screen and supporting communication with the server 4, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like; when the

terminal devices

1, 2, and 3 are software, they may be installed in the electronic device as described above. The

terminal devices

1, 2 and 3 may be implemented as a plurality of software or software modules, or may be implemented as a single software or software module, which is not limited by the embodiments of the present disclosure. Further, the

terminal devices

1, 2, and 3 may have various applications installed thereon, such as a data processing application, an instant messaging tool, social platform software, a search-type application, a shopping-type application, and the like.

The server 4 may be a server providing various services, for example, a backend server receiving a request sent by a terminal device establishing a communication connection with the server, and the backend server may receive and analyze the request sent by the terminal device and generate a processing result. The server 4 may be one server, may also be a server cluster composed of a plurality of servers, or may also be a cloud computing service center, which is not limited in this disclosure.

The server 4 may be hardware or software. When the server 4 is hardware, it may be various electronic devices that provide various services to the

terminal devices

1, 2, and 3. When the server 4 is software, it may be implemented as a plurality of software or software modules that provide various services for the

terminal devices

1, 2, and 3, or may be implemented as a single software or software module that provides various services for the

terminal devices

1, 2, and 3, which is not limited in this embodiment of the disclosure.

The network 5 may be a wired network connected by a coaxial cable, a twisted pair and an optical fiber, or may be a wireless network that can interconnect various Communication devices without wiring, for example, Bluetooth (Bluetooth), Near Field Communication (NFC), Infrared (Infrared), and the like, which is not limited in the embodiment of the present disclosure.

A user can establish a communication connection with the server 4 via the network 5 through the

terminal devices

1, 2, and 3 to receive or transmit information or the like. Specifically, after the user imports the collected data of the interest points into the server 4, the server 4 acquires first data of the interest points to be processed, the first data includes a first longitude latitude and a first classification of the interest points to be processed, and performs conflict check on the interest points to be processed according to the first longitude latitude and the first classification; further, in the case of determining a conflict, the server 4 performs conflict processing on the interest points to be processed, so as to avoid a large amount of repeated data and unavailable data existing in the database.

It should be noted that the specific types, numbers and combinations of the

terminal devices

1, 2 and 3, the server 4 and the network 5 may be adjusted according to the actual requirements of the application scenarios, and the embodiment of the present disclosure does not limit this.

Fig. 2 is a flowchart of a joint learning method based on an attention mechanism according to an embodiment of the present disclosure. The execution subject in fig. 2 of fig. 2 may be a terminal or a server in the figure. The participant in the invention can be terminal equipment of a user or a server; the server referred to in the present invention may comprise a central server of a plurality of participants, as shown in fig. 2, and the method of joint learning based on attention mechanism includes:

s201, receiving a plurality of participant models uploaded by participants;

specifically, the received participant model may be identified as a jointly learned participant model; initializing the participator model; the participator model is a model which is obtained and trained by the joint learning participator.

S202, respectively determining a query parameter matrix and a key value matrix according to a participant model;

specifically, the participant model may be obtained; screening out a target reference participant and a comparison participant in the participant model; extracting the model parameters of the target reference participants as a query parameter matrix; and extracting the model parameters of the comparison participants as a key value matrix.

S203, determining the difference degree between each query parameter matrix and a plurality of key value matrixes;

specifically, in the server, taking a certain participant as an example, the Query parameter matrix is marked as Query, the other participants are marked as Key value matrices as keys, and the difference between the Query and the Key of each participant is calculated.

S204, locally updating the participant model by using the difference degree to obtain a new participant model;

specifically, the difference degree between each group of query parameter matrix and a plurality of key value matrixes can be calculated; establishing a normalization weighting function according to the difference degree; obtaining an attention coefficient of the query parameter matrix by utilizing a normalization weighting function; and adjusting whether the participant model is updated locally or not according to the attention coefficient.

Adjusting whether the participant model is updated locally according to the attention coefficient may include: acquiring an attention coefficient of the query parameter matrix; based on the attention coefficient, carrying out weighted summation on the normalization weighting function value; and adjusting whether the participant model is updated locally or not according to the value of the weighted sum.

According to the technical scheme provided by the embodiment of the disclosure, a plurality of participant models uploaded by participants are received; respectively determining a query parameter matrix and a key value matrix according to the participant model; determining the difference degree between each query parameter matrix and a plurality of key value matrixes; and locally updating the participant model by using the difference degree to obtain a new participant model. The problem that data are distributed unevenly due to the fact that data of all parties are split and the specific difference between the data is difficult to measure directly in the joint learning process is solved.

All the above optional technical solutions may be combined arbitrarily to form optional embodiments of the present application, and are not described herein again.

The following are embodiments of the disclosed apparatus that may be used to perform embodiments of the disclosed methods. For details not disclosed in the embodiments of the apparatus of the present disclosure, refer to the embodiments of the method of the present disclosure.

Fig. 3 is a schematic diagram of a joint learning apparatus based on an attention mechanism according to an embodiment of the present disclosure. As shown in fig. 3, the attention-based joint learning apparatus includes:

a receiving module 301, configured to receive a plurality of participant models uploaded by participants;

a determining module 302, configured to determine a query parameter matrix and a key value matrix according to the participant model;

a calculating module 303, configured to determine a difference between each query parameter matrix and the plurality of key value matrices;

and the updating module 304 is configured to locally update the participant model by using the difference degree to obtain a new participant model.

Preferably, the determining module 302 may include: the method comprises the following steps of obtaining a submodule, a screening submodule, a first extraction submodule and a second extraction submodule; the method comprises the following specific steps:

the acquisition submodule is used for acquiring a participant model;

the screening submodule is used for screening out a target reference participant and a comparison participant in the participant model;

the first extraction submodule is used for extracting the model parameters of the target reference participant into a query parameter matrix;

and the second extraction submodule is used for extracting the model parameters of the contrast participants into a key value matrix.

Preferably, the updating module 304 may include: the method comprises the following steps of calculating a submodule, establishing a submodule, determining a submodule and adjusting a submodule; the method comprises the following specific steps:

the calculation submodule is used for calculating the difference degree between each group of inquiry parameter matrixes and the plurality of key value matrixes;

the establishing submodule is used for establishing a normalization weighting function according to the difference degree;

the determining submodule is used for obtaining an attention coefficient of the query parameter matrix by utilizing a normalization weighting function;

and the adjusting submodule is used for adjusting whether the participant model is locally updated or not according to the attention coefficient.

According to the device provided by the embodiment of the disclosure, the problem that data distribution is uneven due to the fact that data of all parties are split and specific differences among the data are difficult to measure directly in the joint learning process can be solved.

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation on the implementation process of the embodiments of the present disclosure.

Fig. 4 is a schematic diagram of a computer device 4 provided by the disclosed embodiment. As shown in fig. 4, the computer device 4 of this embodiment includes: a processor 401, a memory 402 and a computer program 403 stored in the memory 402 and executable on the processor 401. The steps in the various method embodiments described above are implemented when the processor 401 executes the computer program 403. Alternatively, the processor 401 implements the functions of the respective modules/units in the above-described respective apparatus embodiments when executing the computer program 403.

Illustratively, the computer program 403 may be partitioned into one or more modules/units, which are stored in the memory 402 and executed by the processor 401 to accomplish the present disclosure. One or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution of the computer program 403 in the computer device 4.

The computer device 4 may be a desktop computer, a notebook, a palm computer, a cloud server, or other computer devices. Computer device 4 may include, but is not limited to, a processor 401 and a memory 402. Those skilled in the art will appreciate that fig. 4 is merely an example of a computer device 4 and is not intended to limit computer device 4 and may include more or fewer components than those shown, or some of the components may be combined, or different components, e.g., the computer device may also include input output devices, network access devices, buses, etc.

The Processor 401 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The storage 402 may be an internal storage unit of the computer device 4, for example, a hard disk or a memory of the computer device 4. The memory 402 may also be an external storage device of the computer device 4, such as a plug-in hard disk provided on the computer device 4, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like. Further, memory 402 may also include both internal storage units of computer device 4 and external storage devices. The memory 402 is used for storing computer programs and other programs and data required by the computer device. The memory 402 may also be used to temporarily store data that has been output or is to be output.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules, so as to perform all or part of the functions described above. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.

In the embodiments provided in the present disclosure, it should be understood that the disclosed apparatus/computer device and method may be implemented in other ways. For example, the above-described apparatus/computer device embodiments are merely illustrative, and for example, a division of modules or units, a division of logical functions only, an additional division may be made in actual implementation, multiple units or components may be combined or integrated with another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, the present disclosure may implement all or part of the flow of the method in the above embodiments, and may also be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the computer program may implement the steps of the above methods and embodiments. The computer program may comprise computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain suitable additions or additions that may be required in accordance with legislative and patent practices within the jurisdiction, for example, in some jurisdictions, computer readable media may not include electrical carrier signals or telecommunications signals in accordance with legislative and patent practices.

The above examples are only intended to illustrate the technical solutions of the present disclosure, not to limit them; although the present disclosure has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present disclosure, and are intended to be included within the scope of the present disclosure.

Claims

1. A joint learning method based on an attention mechanism is characterized by comprising the following steps:

receiving a plurality of participant models uploaded by participants;

determining the difference degree between each inquiry parameter matrix and a plurality of key value matrixes;

2. The method of claim 1, wherein receiving a plurality of participant models uploaded by a participant comprises:

confirming that the received participant model is a joint learning participant model;

initializing the participant model;

the participant model is a model which is obtained and trained by the joint learning participant after the joint learning participant acquires the server model.

3. The method of claim 1, wherein determining a matrix of query parameters and a matrix of key values, respectively, according to the participant model comprises:

acquiring the participant model;

screening out a target reference participant and a comparison participant in the participant model;

extracting the model parameters of the target reference participant as a query parameter matrix;

and extracting the model parameters of the comparison participants as a key value matrix.

4. The method of claim 1, wherein locally updating the participant model with the degree of difference to obtain a new participant model comprises:

calculating the difference degree between each group of inquiry parameter matrix and a plurality of key value matrixes;

establishing a normalization weighting function according to the difference degree;

obtaining an attention coefficient of the query parameter matrix by using the normalization weighting function;

and adjusting whether the participant model is locally updated or not according to the attention coefficient.

5. The method of claim 4, wherein adjusting whether the participant model is updated locally according to the attention coefficient comprises:

acquiring an attention coefficient of the query parameter matrix;

performing a weighted summation of the normalized weighting function values based on the attention coefficient;

and adjusting whether the participant model is locally updated or not according to the value of the weighted sum.

6. An attention-based joint learning apparatus, comprising:

the calculation module is used for determining the difference degree between each inquiry parameter matrix and a plurality of key value matrixes;

7. The apparatus of claim 6, wherein the determining module comprises:

an obtaining submodule for obtaining the participant model;

and the second extraction submodule is used for extracting the model parameters of the comparison participants into a key value matrix.

8. The apparatus of claim 6, wherein the update module comprises:

the calculation sub-module is used for calculating the difference degree between each group of inquiry parameter matrix and a plurality of key value matrixes;

the determining submodule is used for obtaining the attention coefficient of the query parameter matrix by utilizing the normalization weighting function;

9. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1 to 5 when executing the computer program.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 5.