CN111737466B - Method for quantizing interactive information of deep neural network - Google Patents

Method for quantizing interactive information of deep neural network Download PDF

Info

Publication number
CN111737466B
CN111737466B CN202010558767.1A CN202010558767A CN111737466B CN 111737466 B CN111737466 B CN 111737466B CN 202010558767 A CN202010558767 A CN 202010558767A CN 111737466 B CN111737466 B CN 111737466B
Authority
CN
China
Prior art keywords
unit
units
neural network
sample
deep neural
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010558767.1A
Other languages
Chinese (zh)
Other versions
CN111737466A (en
Inventor
李超
徐勇军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Computing Technology of CAS
Original Assignee
Institute of Computing Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Computing Technology of CAS filed Critical Institute of Computing Technology of CAS
Priority to CN202010558767.1A priority Critical patent/CN111737466B/en
Publication of CN111737466A publication Critical patent/CN111737466A/en
Application granted granted Critical
Publication of CN111737466B publication Critical patent/CN111737466B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The invention provides a method for quantizing deep neural network interaction information, which comprises the following steps: s1, obtaining a sample from a data set in the field of natural language processing, wherein the sample comprises a plurality of units, each unit corresponds to a word, and the units in the sample are subjected to multiple aggregation processing until the units in the sample are aggregated into one unit; and S2, constructing a dendrogram reflecting inter-word interaction information of the internal modeling of the deep neural network according to a unit aggregation mode in the multiple aggregation processing process of the given sample in the step S1. The method can objectively quantify the interactive information among the words of the input sample modeled in the deep neural network, and clusters the adjacent units with obvious interaction according to the size of the interactive information ratio to finally obtain a tree-shaped hierarchical structure reflecting the interactive information among the words modeled in the deep neural network, thereby providing a universal method for further understanding the deep neural network.

Description

Method for quantizing interactive information of deep neural network
Technical Field
The invention relates to the technical field of deep learning, in particular to application of a deep neural network in the field of natural language processing, and more particularly relates to a method for quantizing interactive information of the deep neural network.
Background
At present, a deep neural network (deep neural network) shows excellent modeling capability on various tasks of natural language processing, but the deep neural network is generally considered as a black box model, the internal modeling logic of the deep neural network is invisible, and the characteristic becomes a blind point for effectively evaluating the accuracy and reliability of a final decision result, so that the interpretation of the internal modeling logic of the neural network becomes an important research direction. Especially in the field of natural language processing, the neural network models which mutual information among input words is still opaque, so decoupling and quantizing the mutual information among all words in an input sentence modeled by the deep neural network plays an important role in understanding the internal logic and decision making mechanism of the neural network.
Disclosure of Invention
Therefore, the present invention is directed to overcoming the above-mentioned drawbacks of the prior art and providing a new method for deep neural network mutual information quantification for understanding the logic inherent in the deep neural network.
The invention discloses a method for quantizing interactive information of a deep neural network, which is used for constructing a tree diagram for quantizing interactive information among words modeled by the deep neural network in a natural language processing task, and the method comprises the following steps:
s1, obtaining a sample from a data set in the field of natural language processing, wherein the sample comprises a plurality of units, each unit corresponds to a word, and the units in the sample are subjected to multiple aggregation processing until the units in the sample are aggregated into a unit; wherein each polymerization treatment comprises: inputting a current sample into a deep neural network, and calculating a Shapril value of each unit in the current sample according to the output of the deep neural network, wherein the deep neural network is used for a natural language processing task in the field of natural language processing; calculating the interactive gain ratio between every two adjacent units based on the salpril value of each unit, aggregating the two adjacent units with the maximum interactive gain ratio into a new unit and forming a new current sample with other units in the current sample for the next aggregation treatment;
and S2, constructing a dendrogram reflecting inter-word interaction information of the internal modeling of the deep neural network according to a unit aggregation mode in the multiple aggregation processing process of the given sample in the step S1. Preferably, the binary tree is constructed by: s31, forming a first layer of leaf nodes from bottom to top of the binary tree by using all units in the sample; and S32, according to the aggregation sequence, taking a new unit formed after each aggregation as a father node of two adjacent units before the aggregation until a root node of the tree is formed.
Wherein the salpril value for each cell in the current sample is a weighted average of the marginal contributions of that cell in the set of all other cells in the current sample that may be made up. The value of salpril for each unit in the current sample is determined as follows:
Figure BDA0002545316300000021
wherein, v represents a neural network,
Figure BDA0002545316300000022
represents the ith cell a in the current sample i N represents the set of all units in the current sample, | N | represents the size of the set N, and S represents the unit a except the ith unit in the current sample i Other units than those that may form a set, | S | represents the size of set S, | S |, in the case of a set! Denotes factorial, v (·) denotes output of deep neural network, v (S ^ { a) } i Denotes the ith unit a i Marginal contribution to set S, where v (S { [ a ]) i Denotes the addition of the i-th unit a to the set S i The set formed is input to the resulting output in the neural network, and v (S) represents the resulting output from the set S input to the neural network.
The interaction gain ratio between two adjacent units is the ratio of the interaction gain of the two adjacent units to all interaction information interacting with the two units.
Preferably, the interaction gain ratio between two adjacent cells is determined by:
Figure BDA0002545316300000023
wherein [ S ] 1 ]Is represented by the set S 1 A unit formed by polymerizing all the units in (A), (B), (C) and (C) [ S ] 2 ]Is represented by the set S 2 A unit formed by polymerizing all the units in (A), (B), (C) and (C) [ S ] 1 ]、[S 2 ]Two adjacent units, B between ([S 1 ],[S 2 ]) Is two adjacent units [ S 1 ]、[S 2 ]Gain of interaction between, [ S ] 1 ]Is a unit of [ S 1 ]Units adjacent to the left side thereof before being polymerized, [ S ] 2 ]Is a unit of [ S 2 ]Its right-adjacent unit before being polymerized, B between ([S 1 ]',[S 1 ]) Is a unit[S 1 ]'、[S 1 ]Gain of interaction between, B between ([S 2 ],[S 2 ]') is a unit [ S 2 ]、[S 2 ]' gain of interaction between, and the unit S 1 ]、[S 2 ]The related total interaction information is B between ([S 1 ],[S 2 ])、B between ([S 1 ]',[S 1 ])、B between ([S 2 ],[S 2 ]')、φ([S 1 ])、φ([S 2 ]) Where phi ([ S ] 1 ])、φ([S 2 ]) Are respectively a unit [ S 1 ]、[S 2 ]A value of salpril.
And the interactive gain of the two adjacent units is the difference between the interactive gain of the new unit after the two adjacent units are aggregated and the interactive gain of the two adjacent units before the two adjacent units are not aggregated.
Preferably, the interaction gains of two adjacent cells are determined by:
B between ([S 1 ],[S 2 ])=B([S])-B([S 1 ])-B([S 2 ])
wherein [ S ]]Denotes a unit formed by the polymerization of all units in the set S, [ S ] 1 ]Is represented by the set S 1 Wherein all units are polymerized to form a unit, [ S ] 2 ]Is represented by the set S 2 Where all units are aggregated to form a unit, B (-) represents the interaction gain within the unit, B between (. Cndot.) represents the interaction gain between units.
In some embodiments of the invention, each cell interaction gain is determined by:
Figure BDA0002545316300000031
wherein, [ S ] represents a unit formed by aggregating all units in the set S, b is a unit in the set S, and N \ S represents a set formed by the units in the set N except the set S.
Compared with the prior art, the invention has the advantages that: the invention innovatively provides a method for quantitatively evaluating and understanding internal logic of a deep neural network, which can be used for objectively quantifying interaction information among input sample words modeled in the deep neural network by combining the thought of a game theory, providing a special index to evaluate an interaction gain rate and constructing a tree structure according to the interaction gain rate, clustering adjacent units with obvious interaction according to the size of the interaction information rate, and finally obtaining a tree-shaped hierarchical structure reflecting the interaction information among the words modeled in the deep neural network, thereby providing a universal method for further understanding the deep neural network.
Drawings
Embodiments of the invention are further described below with reference to the accompanying drawings, in which:
FIG. 1 is a flow chart illustrating a method for deep neural network interaction information quantification according to an embodiment of the present invention;
fig. 2 is a schematic diagram of a binary tree established based on an example sample in a method for quantizing deep neural network interaction information according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail by embodiments with reference to the accompanying drawings. It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the invention.
The invention aims to provide a method for quantizing the interactive information among input words of deep neural network modeling aiming at the black box property of the current deep neural network, so that the interactive action among the words of the deep neural network modeling is objectively explained, and the understanding of the internal logic and decision mechanism of the deep neural network is facilitated.
According to an embodiment of the present invention, as shown in fig. 1, a method for deep neural network mutual information quantization is provided, which includes steps T1, T2, T3, T4, and T5, each of which is described in detail below.
In step T1, a deep neural network for a natural language processing task in the natural language processing domain is obtained.
In step T2, based on a given input sample, calculating a Shapril value of each unit in the given input sample according to the output of the deep neural network feature layer; wherein the given input sample is a sentence comprising a plurality of units from the set of natural language processing domain related data and each unit in the initial given input sample consists of a word. The output of the deep neural network feature layer can be the output of any feature layer of the deep neural network or the final output of the deep neural network.
The Shapril value method is a distribution mode for fairly distributing the benefits which are obtained by each member according to the contribution of each member in one cooperation in the cooperative game theory. In the present invention, given a trained neural network v (i.e., the field game), input samples N = { a = { (a) } 1 ,a 2 ,...,a n Where n is the number of cells contained in the input sample (each cell initially consisting of a word), a i (1 < = i < = n) represents the individual cells in the input sample (i.e., the participants of this game). In order to obtain more benefits, part of the participants cooperate to form a coalition (coordination) S (i.e. a set of partial units), and the total benefit obtained by the coalition S in the game is v (S), i.e. the output value of the neural network when the input unit set is S. If a participant a is outside the federation S i Also join the federation, the federation ultimately achieves a total benefit of v (S { [ a ]) i }), then v (S { [ a ] is U { (A) }) i }) -v (S) denotes participant a i The marginal contribution to the federation S. The salpril values model a weighted average of each participant' S contribution to the marginal contribution from the various possible leagues S in this game v. The ith cell a in the input sample i The value of salapril can be expressed as phi v (a i ),φ v (a i ) Calculated by equation (1):
Figure BDA0002545316300000051
wherein v represents a neural network and N represents a current inputEntering a set formed by all units in the sample, | N | represents the size of the set N, and S represents the unit a except the ith unit in the current sample i Other units than possible may form a set, | S | represents the size of the set S |, and |. Representing factorial, v (-) represents the output of the deep neural network. And finally, proportionally distributing the benefit of each member in the game to obtain the influence degree of each unit in the input sample on the final decision of the neural network, wherein the larger the salpril value is, the larger the influence of the unit on the final decision of the neural network is, and vice versa.
Preferably, phi can be calculated by means of sampling v (a i ). For example, for a given input sample containing N cells, expressed as the set N = { a = 1 ,a 2 ,...,a n H, consider the currently needed computing unit a i Saapril value of phi v (a i ) Set N \ a formed by the rest of other units in set N i Sampling once to obtain a set S, masking word vectors corresponding to units (namely, units not included in the set S) which are not sampled in the input samples into 0 vectors, thereby obtaining samples S after masking, sending the samples S into a deep neural network to obtain v (S), and similarly, if the unit a is used, obtaining v (S) through the processing of masking the word vectors corresponding to the units which are not sampled in the input samples i Adding the sample into a set S obtained by sampling, and then adding the masked sample S U { a } into a sample solution i Sending it to neural network to obtain v (S U { a) } i }),v(S∪{a i Is the current cell a i The marginal contribution of (c). According to one embodiment of the invention, sample M (M ≦ 2) n-1 ) Then, calculate the average value of M marginal contribution amounts as unit a i Saapril value of phi v (i) In that respect Similarly, the value of salpril for each unit in the input sample can be obtained.
In step T3, the interaction gain ratio between any two adjacent cells in a given input sample is calculated. For a given game (in the present invention, a trained deep neural network v), several participants form an indivisible whole (in the present invention, a whole [ S ] formed by some units in a given input sample), that is, the whole [ S ] is regarded as a participant, S represents a union of the several participants (in the present invention, the participants are units, and S is a set of the units), and then an interaction gain B ([ S ]) included inside the participant [ S ] is:
Figure BDA0002545316300000052
wherein b is an element in the set S, [ S ]]Is a unit aggregated by all units in the set S, v (N \S)∪{[S]} Representing actual participants in game v as members of set N minus members of set S plus participant [ S ]]Gain obtained (in the present invention, v (N\S)∪{[S]} Means that the cell in a given input sample set is subtracted by the cell in the set S plus the cell S]Output of the deep neural network v as input); similarly, v (N\S)∪{b} Representing the actual participant in game v as a member in N minus a member in S plus the benefit gained by participant b (in the present invention, v (N\S)∪{b} Representing the output of the deep neural network v at the input given the unit in the input sample set N minus the unit in the set S plus the unit b).
Figure BDA0002545316300000061
Indicating unit [ S]At v (N\S)∪{[S]} The values of the following sapril, for the same reason,
Figure BDA0002545316300000062
indicating unit b at v (N\S)∪{b} The following values of salpril, the calculation of the salpril values, are referred to the aforementioned formula (1).
In accordance with one example of the present invention, assume that the set of all cells in the input sample is N = { a 1 ,a 2 ,...,a n Consider a unit formed by the aggregation of some two adjacent units
Figure BDA0002545316300000064
The unit (containing a plurality of words) and other units left in the sample form a sample after primary clustering
Figure BDA0002545316300000066
Can calculate the unit
Figure BDA0002545316300000065
And then the cross gain of the unit is obtained according to the formula (2):
Figure BDA0002545316300000063
because the units in the input samples are continuously aggregated, the number of the units contained in the aggregated samples is continuously reduced until the aggregated samples are finally aggregated into a unit (that is, the whole input sample is taken as a unit). In this polymerization process, if two adjacent units [ S ] 1 ],[S 2 ]Are combined into a unit [ S ]]Then, the interactive gain index between the three has the following equation relationship:
B([S])=B([S 1 ])+B([S 2 ])+B between ([S 1 ],[S 2 ]) (3)
B([S 1 ]),B([S 2 ]) Are respectively two [ S 1 ]、[S 1 ]Inter-gain within a cell, B between ([S 1 ],[S 2 ]) Is the interaction gain between two units, then B between ([S 1 ],[S 2 ]) This can be derived as follows:
B between ([S 1 ],[S 2 ])=B([S])-B([S 1 ])-B([S 2 ])
finally, the interaction gain B between two adjacent cells can be calculated between The ratio r of all the interaction information that has interacted with the two units. When two units [ S ] 1 ],[S 2 ]Are aggregated into a unit [ S]Time, note unit [ S 1 ]The left adjacent unit before being polymerized is [ S ] 1 ]', unit [ S 2 ]The unit adjacent to the right before being polymerized is [ S ] 2 ]', unit [ S 1 ],[S 2 ]Has a gain of B between ([S 1 ],[S 2 ]) Unit [ S ] 1 ]',[S 1 ]Has a gain of B between ([S 1 ]',[S 1 ]) Unit [ S ] 2 ],[S 2 ]' gain of interaction between B between ([S 2 ],[S 2 ]') and a unit [ S ] 1 ],[S 2 ]The related total mutual information is B between ([S 1 ],[S 2 ])、B between ([S 1 ]',[S 1 ])、B between ([S 2 ],[S 2 ]')、φ([S 1 ])、φ([S 2 ]) Wherein phi ([ S ] 1 ]),φ([S 2 ]) Are respectively a unit [ S 1 ]、[S 2 ]A value of salpril of [ S ] 1 ]、[S 2 ]The gain ratio of the interaction between the two is as follows:
Figure BDA0002545316300000071
in step T4, aggregating two adjacent units with the maximum interactive gain rate to form a new unit, and forming a sample after primary aggregation together with the rest other units in the sample; taking the aggregated sample as a new given input sample, repeating steps T2 to T4, and repeating the iteration until the aggregated sample contains only one unit, obviously, the unit contains all words in the initial sample.
For example, according to an example of the present invention, the set of all cell components in the input sample is N = { a = { (a) } 1 ,a 2 ,...,a n The combination of (n-1) two by two adjacent cells that the cells in the set can form will have the two cells with the largest interaction gain ratio, such as a i ,a j Are polymerized into a unit
Figure BDA0002545316300000072
Forming a clustered sample with the rest (n-2) units
Figure BDA0002545316300000073
Obviously, N' contains N-1 units, and the iteration is continued, and finallyForming a sample N containing only one cell root =[{a 1 ,a 2 ,...,a n }]Set N of root Contains all the words a in a given input sample 1 ,a 2 ,...,a n
In step T5, a binary tree containing a tree hierarchy is created according to the process of continuously aggregating cells in the input samples. Obviously, the leaf nodes of the tree are the words in the input sample, and each time two units are aggregated into one unit, one intermediate node of the tree is formed, and as the aggregation progresses, the root node of the tree is finally formed, and a binary tree can be built in such a bottom-up manner.
According to one example of the invention, a binary tree is constructed, taking as an example the given input samples { the, sun, is, corning, out }. The method comprises the steps that a sample composed of the units, sun, is, corning and out is input into a deep neural network trained by a natural language processing field data set, the units in the sample are continuously aggregated according to the output of the deep neural network, for example, when the units are aggregated for the first time, the units are aggregated into a new unit, the sun and the sun are aggregated into a new sample { the sun, is, corning and out } with the rest of the units is, corning and out; during the second polymerization, the units the sun and is are polymerized to form a new unit the sun is, and then the new unit the sun is and the rest other units come and go form a new sample { the sun is, come and go } after polymerization; during the third aggregation, the units corning and out are aggregated to form a new unit corning out, and then a new aggregated sample { the sun is, corning out } is formed with the rest other units the sun is; the sun is and the corning out are aggregated to form a unit of the sun is corning out in the fourth aggregation, the unit is a root node of the tree, a binary tree corresponding to a sample { the, sun, is, corning, out } constructed according to the aggregation process is shown in fig. 2, and the inter-word interaction information of the deep neural network modeling is displayed through the binary tree, so that the understanding of the internal logic of the deep neural network can be facilitated.
The method provided by the invention utilizes a hierarchical structure to explain the internal logic of the neural network, can objectively quantify the interactive information among the words of the input sample modeled in the deep neural network, and clusters the adjacent units with obvious interaction according to the ratio of the interactive information, finally obtains a tree-shaped hierarchical structure reflecting the interactive information among the words modeled in the deep neural network, and provides a universal method for further understanding the deep neural network. The method can be used for constructing the tree diagram for any deep neural network used for natural language processing tasks in the natural language processing field so as to understand the inherent logic of the deep neural network.
It should be noted that, although the steps are described in a specific order, the steps are not necessarily executed in the specific order, and in fact, some of the steps may be executed concurrently or even in a changed order as long as the required functions are achieved.
The present invention may be a system, method and/or computer program product. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied therewith for causing a processor to implement various aspects of the present invention.
The computer readable storage medium may be a tangible device that retains and stores instructions for use by an instruction execution device. The computer readable storage medium may include, for example, but is not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, such as punch cards or in-groove projection structures having instructions stored thereon, and any suitable combination of the foregoing.
While embodiments of the present invention have been described above, the above description is illustrative, not exhaustive, and not limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein was chosen in order to best explain the principles of the embodiments, the practical application, or technical improvements to the market, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims (10)

1. A method for deep neural network interaction information quantification for constructing a dendrogram quantifying inter-word interaction information modeled by a deep neural network in a natural language processing task, the method comprising:
s1, obtaining a sample from a data set in the field of natural language processing, wherein the sample comprises a plurality of units, each unit corresponds to a word, and the units in the sample are subjected to multiple aggregation processing until the units in the sample are aggregated into a unit;
wherein each polymerization treatment comprises:
inputting a current sample into a deep neural network, and calculating a Shapril value of each unit in the current sample according to the output of the deep neural network, wherein the deep neural network is a deep neural network used for a natural language processing task in the field of natural language processing;
calculating the interactive gain rate between every two adjacent units based on the sand-pril value of each unit, aggregating the two adjacent units with the maximum interactive gain rate into a new unit, and forming a new current sample with other units in the current sample for the next aggregation treatment;
wherein the interaction gain ratio between two adjacent units is determined by:
Figure FDA0003866708210000011
wherein [ S ] 1 ]Is represented by the set S 1 A unit formed by polymerization of all units in (1) [ S ] 2 ]Is represented by the set S 2 A unit formed by polymerizing all the units in (A), (B), (C) and (C) [ S ] 1 ]、[S 2 ]Being two adjacent cells, B between ([S 1 ],[S 2 ]) Is two adjacent units [ S 1 ]、[S 2 ]Gain of interaction between, [ S ] 1 ]' is a unit [ S 1 ]The left-adjacent unit before being polymerized, [ S ] 2 ]' is a unit [ S 2 ]Its right-adjacent unit before being polymerized, B between ([S 1 ]',[S 1 ]) Is a unit [ S 1 ]'、[S 1 ]Gain of interaction between, B between ([S 2 ],[S 2 ]') is a unit [ S 2 ]、[S 2 ]' gain of interaction between, and the unit S 1 ]、[S 2 ]The related total mutual information is B between ([S 1 ],[S 2 ])、B between ([S 1 ]',[S 1 ])、B between ([S 2 ],[S 2 ]')、φ([S 1 ])、φ([S 2 ]) Where phi ([ S ] 1 ])、φ([S 2 ]) Are respectively a unit [ S 1 ]、[S 2 ]A salapril value of;
and S2, constructing a dendrogram reflecting inter-word interaction information of the internal modeling of the deep neural network according to a unit aggregation mode in the multiple aggregation processing process of the given sample in the step S1.
2. The method of claim 1, wherein the value of salpril for each cell in the current sample is a weighted average of the marginal contributions of the cell in the set of all other cells in the current sample.
3. The method of claim 2, wherein the Shapril value of each unit in the current sample is determined by:
Figure FDA0003866708210000021
wherein, v represents a neural network,
Figure FDA0003866708210000022
represents the ith cell a in the current sample i The value of Shapril, N represents the set of all units in the current sample, | N | represents the size of the set N, and S represents the unit a except the ith unit in the current sample i Other units than those that may form a set, | S | represents the size of set S, | S |, in the case of a set! Denotes factorial, v (·) denotes output of deep neural network, v (S { C { a } { i Denotes the ith unit a i Marginal contribution to set S, where v (S { [ a ]) i Denotes adding the ith element a to the set S i The resulting set is input into the neural network, and v (S) represents the output resulting from the set S input into the neural network.
4. The method of claim 3, wherein the interaction gain ratio between two adjacent units is the ratio of the interaction gain of the two adjacent units to the total interaction information interacted with the two units.
5. The method of claim 4, wherein the inter-gain of two neighboring units is a difference between an inter-gain of a new unit after aggregation of the two neighboring units and an inter-gain of the two neighboring units before aggregation.
6. The method of claim 5, wherein the interaction gains of two adjacent units are determined by:
B between ([S 1 ],[S 2 ])=B([S])-B([S 1 ])-B([S 2 ])
wherein [ S ]]Represents a unit formed by the polymerization of all units in the set S, [ S ] 1 ]Is represented by the set S 1 Wherein all units are polymerized to form a unit, [ S ] 2 ]Is represented by the set S 2 Where all units are aggregated to form a unit, B (-) represents the interaction gain within the unit, B between (. Cndot.) represents the interaction gain between units.
7. The method of claim 6, wherein each unit interaction gain is determined by:
Figure FDA0003866708210000023
wherein, [ S ] represents a unit formed by polymerization of all units in the set S, b is a unit in the set S, and N \ S represents a set formed by the units in the set N except the set S.
8. The method for quantizing deep neural network interaction information according to any one of claims 1 to 6, wherein the step S2 constructs a binary tree by:
s31, forming a first layer of leaf nodes from bottom to top of the binary tree by using all units in the sample;
and S32, according to the aggregation sequence, taking a new unit formed after each aggregation as a father node of two adjacent units before the aggregation until a root node of the tree is formed.
9. A computer-readable storage medium, having embodied thereon a computer program, the computer program being executable by a processor to perform the steps of the method of any one of claims 1 to 8.
10. An electronic device, comprising:
one or more processors;
storage means for storing one or more programs which, when executed by the one or more processors, cause the electronic device to carry out the steps of the method according to any one of claims 1 to 8.
CN202010558767.1A 2020-06-18 2020-06-18 Method for quantizing interactive information of deep neural network Active CN111737466B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010558767.1A CN111737466B (en) 2020-06-18 2020-06-18 Method for quantizing interactive information of deep neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010558767.1A CN111737466B (en) 2020-06-18 2020-06-18 Method for quantizing interactive information of deep neural network

Publications (2)

Publication Number Publication Date
CN111737466A CN111737466A (en) 2020-10-02
CN111737466B true CN111737466B (en) 2022-11-29

Family

ID=72649650

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010558767.1A Active CN111737466B (en) 2020-06-18 2020-06-18 Method for quantizing interactive information of deep neural network

Country Status (1)

Country Link
CN (1) CN111737466B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112200320B (en) * 2020-12-02 2021-03-02 成都数联铭品科技有限公司 Model interpretation method, system, equipment and storage medium based on cooperative game method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108875024A (en) * 2018-06-20 2018-11-23 清华大学深圳研究生院 File classification method, system, readable storage medium storing program for executing and electronic equipment
CN109299262A (en) * 2018-10-09 2019-02-01 中山大学 A kind of text implication relation recognition methods for merging more granular informations
CN109858032A (en) * 2019-02-14 2019-06-07 程淑玉 Merge more granularity sentences interaction natural language inference model of Attention mechanism
CN110866113A (en) * 2019-09-30 2020-03-06 浙江大学 Text classification method based on sparse self-attention mechanism fine-tuning Bert model

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10783437B2 (en) * 2017-03-05 2020-09-22 International Business Machines Corporation Hybrid aggregation for deep learning neural networks

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108875024A (en) * 2018-06-20 2018-11-23 清华大学深圳研究生院 File classification method, system, readable storage medium storing program for executing and electronic equipment
CN109299262A (en) * 2018-10-09 2019-02-01 中山大学 A kind of text implication relation recognition methods for merging more granular informations
CN109858032A (en) * 2019-02-14 2019-06-07 程淑玉 Merge more granularity sentences interaction natural language inference model of Attention mechanism
CN110866113A (en) * 2019-09-30 2020-03-06 浙江大学 Text classification method based on sparse self-attention mechanism fine-tuning Bert model

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于合作博弈的虚拟化资源效用分配策略;张小庆等;《计算机科学》;20120615(第06期);全文 *
融合Attention多粒度句子交互自然语言推理研究;程淑玉等;《小型微型计算机系统》;20190614(第06期);全文 *

Also Published As

Publication number Publication date
CN111737466A (en) 2020-10-02

Similar Documents

Publication Publication Date Title
CN110807154B (en) Recommendation method and system based on hybrid deep learning model
CN111078836B (en) Machine reading understanding method, system and device based on external knowledge enhancement
CN109783817B (en) Text semantic similarity calculation model based on deep reinforcement learning
CN110674850A (en) Image description generation method based on attention mechanism
CN111309927B (en) Personalized learning path recommendation method and system based on knowledge graph mining
CN111125520B (en) Event line extraction method based on deep clustering model for news text
CN112487193B (en) Zero sample picture classification method based on self-encoder
Tang et al. Modelling student behavior using granular large scale action data from a MOOC
WO2023045725A1 (en) Method for dataset creation, electronic device, and computer program product
Clarke Logical constraints: The limitations of QCA in social science research
Lin et al. A new multilevel cart algorithm for multilevel data with binary outcomes
CN116227180A (en) Data-driven-based intelligent decision-making method for unit combination
CN111737466B (en) Method for quantizing interactive information of deep neural network
Schwier et al. Zero knowledge hidden markov model inference
CN110659363A (en) Web service mixed evolution clustering method based on membrane computing
CN112529071B (en) Text classification method, system, computer equipment and storage medium
CN115330142B (en) Training method of joint capacity model, capacity demand matching method and device
Christensen et al. Factor or network model? Predictions from neural networks
CN110348577B (en) Knowledge tracking method based on fusion cognitive computation
Grabisch et al. A model of influence with a continuum of actions
CN112463964A (en) Text classification and model training method, device, equipment and storage medium
Vuong et al. The bayesvl R package. User guide v0. 8.1
Qurtubi Algorithm Modeling To Predict Students Learning Achievement Based On Behavioral Parameters As The Implementation Of Learning Management
Wang et al. Dim: adaptively combining user interests mined at different stages based on deformable interest model
Park et al. Pac neural prediction set learning to quantify the uncertainty of generative language models

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant