CN111737466A - Method for quantizing interactive information of deep neural network - Google Patents

Method for quantizing interactive information of deep neural network Download PDF

Info

Publication number
CN111737466A
CN111737466A CN202010558767.1A CN202010558767A CN111737466A CN 111737466 A CN111737466 A CN 111737466A CN 202010558767 A CN202010558767 A CN 202010558767A CN 111737466 A CN111737466 A CN 111737466A
Authority
CN
China
Prior art keywords
unit
units
neural network
deep neural
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010558767.1A
Other languages
Chinese (zh)
Other versions
CN111737466B (en
Inventor
李超
徐勇军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Computing Technology of CAS
Original Assignee
Institute of Computing Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Computing Technology of CAS filed Critical Institute of Computing Technology of CAS
Priority to CN202010558767.1A priority Critical patent/CN111737466B/en
Publication of CN111737466A publication Critical patent/CN111737466A/en
Application granted granted Critical
Publication of CN111737466B publication Critical patent/CN111737466B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)

Abstract

The invention provides a method for quantizing deep neural network interaction information, which comprises the following steps: s1, obtaining a sample from the natural language processing field data set, wherein the sample comprises a plurality of units, each unit corresponds to a word, and the units in the sample are subjected to multiple aggregation processing until the units in the sample are aggregated into a unit; s2, constructing a tree diagram reflecting the inter-word interaction information of the deep neural network internal modeling according to the unit aggregation mode in the multiple aggregation processing process of the given sample in the step S1. The method can objectively quantify the interactive information among the input sample words modeled in the deep neural network, and cluster the adjacent units with obvious interaction according to the interactive information ratio to finally obtain a tree-shaped hierarchical structure reflecting the interactive information among the words modeled in the deep neural network, thereby providing a universal method for further understanding the deep neural network.

Description

Method for quantizing interactive information of deep neural network
Technical Field
The invention relates to the technical field of deep learning, in particular to application of a deep neural network in the field of natural language processing, and more particularly relates to a method for quantizing interactive information of the deep neural network.
Background
At present, a deep neural network (deep neural network) shows excellent modeling capability on various tasks of natural language processing, but the deep neural network is generally considered as a black box model, the internal modeling logic of the deep neural network is invisible, and the characteristic becomes a blind point for effectively evaluating the accuracy and reliability of a final decision result, so that the interpretation of the internal modeling logic of the neural network becomes an important research direction. Particularly in the field of natural language processing, the neural network models which mutual information among input words is still opaque, so that decoupling and quantizing the mutual information among all words in an input sentence modeled by the deep neural network plays an important role in understanding the internal logic and decision making mechanism of the neural network.
Disclosure of Invention
Therefore, the present invention is directed to overcoming the above-mentioned drawbacks of the prior art and providing a new method for deep neural network mutual information quantification for understanding the logic inherent in the deep neural network.
The invention discloses a method for quantizing deep neural network interaction information, which is used for constructing a tree diagram for quantizing the interaction information among words modeled by a deep neural network in a natural language processing task, and the method comprises the following steps:
s1, obtaining a sample from the natural language processing field data set, wherein the sample comprises a plurality of units, each unit corresponds to a word, and the units in the sample are subjected to multiple aggregation processing until the units in the sample are aggregated into a unit; wherein each polymerization treatment comprises: inputting a current sample into a deep neural network, and calculating a Shapril value of each unit in the current sample according to the output of the deep neural network, wherein the deep neural network is used for a natural language processing task in the field of natural language processing; calculating the interactive gain rate between every two adjacent units based on the sand-pril value of each unit, aggregating the two adjacent units with the maximum interactive gain rate into a new unit, and forming a new current sample with other units in the current sample for the next aggregation treatment;
s2, constructing a tree diagram reflecting the inter-word interaction information of the deep neural network internal modeling according to the unit aggregation mode in the multiple aggregation processing process of the given sample in the step S1. Preferably, the binary tree is constructed as follows: s31, forming a first layer leaf node of the binary tree from bottom to top by using all units in the sample; and S32, according to the aggregation sequence, taking a new unit formed after each aggregation as a father node of two adjacent units before the aggregation until a root node of the tree is formed.
Wherein the value of the salpril for each cell in the current sample is a weighted average of the marginal contributions of that cell in the set of all other cells in the current sample that may be made up. The value of salpril for each unit in the current sample is determined as follows:
Figure BDA0002545316300000021
wherein, v represents a neural network,
Figure BDA0002545316300000022
representing the ith cell a in the current sampleiThe value of Shapril, N represents the set of all units in the current sample, | N | represents the size of the set N, and S represents the unit a except the ith unit in the current sampleiOther possible combinations of units than S, where S represents the size of the set S, where factorial, v (-) represents the output of the deep neural network, and v (S ∪ { a) }iMeans of the ith unit aiMarginal contribution to set S, where v (S ∪ { a)iDenotes the addition of the i-th unit a to the set SiThe resulting outputs from the set S input neural network are denoted by v (S).
The interaction gain ratio between two adjacent units is the ratio of the interaction gain of the two adjacent units to all interaction information interacting with the two units.
Preferably, the interaction gain ratio between two adjacent cells is determined by:
Figure BDA0002545316300000023
wherein [ S ]1]Is represented by the set S1A unit formed by polymerizing all the units in (A), (B), (C) and (C) [ S ]2]Is represented by the set S2A unit formed by polymerizing all the units in (A), (B), (C) and (C) [ S ]1]、[S2]Being two adjacent cells, Bbetween([S1],[S2]) Is two adjacent units [ S1]、[S2]Gain of interaction between, [ S ]1]Is a unit of [ S1]Units adjacent to the left side thereof before being polymerized, [ S ]2]Is a unit of [ S2]Its right-adjacent unit before being polymerized, Bbetween([S1]',[S1]) Is a unit [ S1]'、[S1]Gain of interaction between, Bbetween([S2],[S2]') is a unit [ S2]、[S2]' mutual gain between, and unit [ S1]、[S2]The related total interaction information is Bbetween([S1],[S2])、Bbetween([S1]',[S1])、Bbetween([S2],[S2]')、φ([S1])、φ([S2]) Where phi ([ S ]1])、φ([S2]) Are respectively a unit [ S1]、[S2]A value of salpril.
And the interactive gain of the two adjacent units is the difference between the interactive gain of the new unit after the two adjacent units are aggregated and the interactive gain of the two adjacent units before the two adjacent units are not aggregated.
Preferably, the interaction gain of two adjacent cells is determined by:
Bbetween([S1],[S2])=B([S])-B([S1])-B([S2])
wherein [ S ]]Denotes a unit formed by the polymerization of all units in the set S, [ S ]1]Is represented by the set S1Wherein all units are polymerized to form a unit, [ S ]2]Is represented by the set S2Wherein all units are aggregated to form a unit, B (-) represents the interaction gain within the unit, Bbetween(. represents) interactions between unitsAnd (4) gain.
In some embodiments of the invention, each cell interaction gain is determined by:
Figure BDA0002545316300000031
wherein, [ S ] represents a unit formed by aggregating all units in the set S, b is a unit in the set S, and N \ S represents a set formed by the units in the set N except the set S.
Compared with the prior art, the invention has the advantages that: the invention innovatively provides a method for quantitatively evaluating and understanding internal logic of a deep neural network, which can be used for objectively quantifying interaction information among input sample words modeled in the deep neural network by combining the thought of a game theory, providing a special index to evaluate an interaction gain rate and constructing a tree structure according to the interaction gain rate, clustering adjacent units with obvious interaction according to the size of the interaction information rate, and finally obtaining a tree-shaped hierarchical structure reflecting the interaction information among the words modeled in the deep neural network, thereby providing a universal method for further understanding the deep neural network.
Drawings
Embodiments of the invention are further described below with reference to the accompanying drawings, in which:
FIG. 1 is a flow chart illustrating a method for deep neural network interaction information quantification according to an embodiment of the present invention;
fig. 2 is a schematic diagram of a binary tree established based on an example sample in a method for quantizing deep neural network interaction information according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail by embodiments with reference to the accompanying drawings. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The invention aims to provide a method for quantizing interactive information among input words of deep neural network modeling aiming at the black box property of the current deep neural network, so that the interaction among the words of the deep neural network modeling is objectively explained, and the understanding of the internal logic and decision mechanism of the deep neural network is facilitated.
According to an embodiment of the present invention, as shown in fig. 1, there is provided a method for deep neural network interaction information quantization, including steps T1, T2, T3, T4, and T5, each of which is described in detail below.
In step T1, a deep neural network for a natural language processing task in the natural language processing domain is obtained.
In step T2, based on the given input sample, calculating a value of salpril for each unit in the given input sample according to the output of the deep neural network feature layer; wherein the given input sample is a sentence from the natural language processing domain-related dataset comprising a plurality of units, and each unit in the initial given input sample consists of a word. The output of the deep neural network feature layer can be the output of any feature layer of the deep neural network or the final output of the deep neural network.
The Shapril value method is a distribution mode for fairly distributing the benefits which are obtained by each member according to the contribution of each member in one cooperation in the cooperative game theory. In the present invention, given a trained neural network v (i.e., the field game), the input samples N ═ a1,a2,...,anWhere n is the number of cells contained in the input sample (each cell initially consisting of a word), ai(1 < i < n) denotes the individual units in the input sample (i.e., the players of the game). In order to obtain more benefits, part of the participants cooperate to form a coalition (coordination) S (i.e. a set of partial units), and the total benefit obtained by the coalition S in the game is v (S), i.e. the output value of the neural network when the input unit set is S. If a participant a is outside the federation SiAlso join the federation, the total benefit ultimately obtained by the federation is v (S ∪ { a)i}), then v (S ∪ { a)i}) -v (S) denotes participant aiThe marginal contribution to the federation S. The salpril values model a weighted average of each participant' S contribution to the marginal contribution from the various possible leagues S in the game v. The ith cell a in the input sampleiThe value of salapril can be expressed as phiv(ai),φv(ai) Calculated by equation (1):
Figure BDA0002545316300000051
wherein v represents a neural network, N represents a set composed of all units in the current input sample, | N | represents the size of the set N, and S represents the unit a except the ith unit in the current sampleiOther units than possible may form a set, | S | represents the size of the set S |, and |. Representing factorial, v (-) represents the output of the deep neural network. And finally, the size of the sand pril value is used for proportionally distributing the benefits of each member in the game, so that the influence degree of each unit in the input sample on the final decision of the neural network is obtained, and the larger the sand pril value is, the larger the influence of the unit on the final decision of the neural network is, and vice versa.
Preferably, phi is calculated by means of samplingv(ai). For example, for a given input sample containing N elements, denoted as the set N ═ a1,a2,...,anConsider the currently needed computing unit aiSaapril value of phiv(ai) Set N \ a formed by the rest of other units in set NiSampling once to obtain a set S, masking word vectors corresponding to units (namely, units not included in the set S) which are not sampled in the input samples to be 0 vectors, thereby obtaining samples S after masking, sending the samples S to a deep neural network to obtain v (S), and similarly, if the unit a is subjected to masking, obtaining v (S) by using the word vectorsiAdding the sample into the sampled set S, and then masking the processed sample S ∪ { a }iSending into neural network to obtain v (S ∪ { a) }i}),v(S∪{aiThe current unit a isiThe marginal contribution amount of (c). According toIn one embodiment of the invention, sample M (M ≦ 2)n-1) Then, calculate the average value of M times of marginal contribution amounts as unit aiSaapril value of phiv(i) In that respect Similarly, the value of salpril for each unit in the input sample can be obtained.
In step T3, the interaction gain ratio between any two adjacent cells in a given input sample is calculated. For a given game (in the present invention, a trained deep neural network v), several participants form an indivisible whole (in the present invention, a whole [ S ] formed by some units in a given input sample), that is, the whole [ S ] is taken as one participant, S represents a union of the several participants (in the present invention, the participants are units, and S is a set of the units), so that the interaction gain B ([ S ]) included in the participant [ S ] is:
Figure BDA0002545316300000052
wherein b is an element in the set S, [ S ]]Is a unit aggregated by all units in the set S, v(N \S)∪{[S]}Representing actual participants in game v as members of set N minus members of set S plus participant [ S ]]Gain obtained (in the present invention, v(N\S)∪{[S]}Means that the cell in a given input sample set is subtracted by the cell in the set S plus the cell S]Output of the deep neural network v as input); similarly, v(N\S)∪{b}Representing the actual participant in game v as a member in N minus a member in S plus the benefit earned by participant b (in the present invention, v(N\S)∪{b}Representing the output of the deep neural network v at the input given the unit in the input sample set N minus the unit in the set S plus the unit b).
Figure BDA0002545316300000061
Indicating unit [ S]At v(N\S)∪{[S]}The values of the underlying safapril, in the same way,
Figure BDA0002545316300000062
indicating unit b at v(N\S)∪{b}The following values of salpril, the calculation of the salpril values, are referred to the aforementioned formula (1).
According to one example of the present invention, assume that the set of all elements in an input sample is N ═ a1,a2,...,anConsider a unit formed by the aggregation of some two adjacent units
Figure BDA0002545316300000064
The unit (containing a plurality of words) and other units left in the sample form a clustered sample
Figure BDA0002545316300000066
Can calculate the unit
Figure BDA0002545316300000065
And then the cross gain of the unit is obtained according to the formula (2):
Figure BDA0002545316300000063
because the units in the input samples are continuously aggregated, the number of the units contained in the aggregated samples is continuously reduced until the units are finally aggregated into a unit (namely, the whole input sample is taken as a unit). In this polymerization process, if two adjacent units [ S ]1],[S2]Are polymerized into a unit [ S ]]Then, the interactive gain index between the three is the following equation:
B([S])=B([S1])+B([S2])+Bbetween([S1],[S2]) (3)
B([S1]),B([S2]) Are respectively two [ S1]、[S1]Inter-gain within a cell, Bbetween([S1],[S2]) Is the interaction gain between two units, then Bbetween([S1],[S2]) This can be derived as follows:
Bbetween([S1],[S2])=B([S])-B([S1])-B([S2])
finally, the interaction gain B between two adjacent cells can be calculatedbetweenThe ratio r of all the interaction information interacting with these two units. When two units [ S ]1],[S2]Are aggregated into a unit [ S ]]Time, note unit [ S1]The left adjacent unit before being polymerized is [ S ]1]', unit [ S2]The unit adjacent to the right before being polymerized is [ S ]2]', unit [ S1],[S2]The gain of the interaction between is Bbetween([S1],[S2]) Unit [ S ]1]',[S1]The gain of the interaction between is Bbetween([S1]',[S1]) Unit [ S ]2],[S2]' the gain of interaction between Bbetween([S2],[S2]'), and the unit [ S ]1],[S2]The related total interaction information is Bbetween([S1],[S2])、Bbetween([S1]',[S1])、Bbetween([S2],[S2]')、φ([S1])、φ([S2]) Wherein phi ([ S ]1]),φ([S2]) Are respectively a unit [ S1]、[S2]A value of salpril of [ S ]1]、[S2]The interactive gain ratio between the two is:
Figure BDA0002545316300000071
in step T4, two adjacent cells with the maximum inter-gain ratio are aggregated to form a new cell, and the new cell and the remaining other cells in the sample together form a once aggregated sample; the steps T2 to T4 are repeated with the aggregated sample as the new given input sample, and the iteration is continued until the aggregated sample contains only one unit, which obviously contains all the words in the initial sample.
For example, according to an example of the present invention, the set of all elements in the input sample is N ═ a1,a2,...,anThe cells in the set(n-1) combinations of two adjacent units that can be formed, the two units with the largest interaction gain ratio, such as ai,ajAre polymerized into a unit
Figure BDA0002545316300000072
Forming a clustered sample with the rest (n-2) units
Figure BDA0002545316300000073
Obviously, N' contains N-1 units, and the iteration is repeated, and finally, a sample N containing only one unit is formedroot=[{a1,a2,...,an}]Set N ofrootContains all the words a in a given input sample1,a2,...,an
In step T5, a binary tree containing a tree-like hierarchy is created according to the process of continuously aggregating cells in the input samples. Obviously, the leaf nodes of the tree are the words in the input sample, each time two units are aggregated into one unit, an intermediate node of the tree is formed, and as the aggregation progresses, the root node of the tree is finally formed, and a binary tree can be built in such a bottom-up manner.
According to one example of the invention, a binary tree is constructed with a given input sample { the, sun, is, coming, out } as an example. The method comprises the steps that a sample composed of the units, sun, is, corning and out is input into a deep neural network trained by a natural language processing field data set, the units in the sample are continuously aggregated according to the output of the deep neural network, for example, when the units are aggregated for the first time, the units are aggregated into a new unit, the sun and the sun are aggregated into a new sample { the sun, is, corning and out } with the rest of the units is, corning and out; during the second polymerization, the units the sun and is are polymerized to form a new unit the sun is, and then the new unit is combined with the rest other units and out to form a new sample { the sun is, combining and out } after polymerization; during the third aggregation, the units corning and out are aggregated to form a new unit corning out, and then a new aggregated sample { the sun is, corning out } is formed with the rest other units the sun is; the sun is and the corning out are aggregated to form a unit of the sun is corning out in the fourth aggregation, the unit is a root node of the tree, a binary tree corresponding to a sample { the, sun, is, corning, out } constructed according to the aggregation process is shown in fig. 2, and the inter-word interaction information of the deep neural network modeling is displayed through the binary tree, so that the understanding of the internal logic of the deep neural network can be facilitated.
The method provided by the invention explains the internal logic of the neural network by utilizing a hierarchical structure, can objectively quantify the interactive information among the input sample words modeled in the deep neural network, and clusters the adjacent units with obvious interaction according to the ratio of the interactive information, finally obtains a tree-shaped hierarchical structure reflecting the interactive information among the words modeled in the deep neural network, and provides a universal method for further understanding the deep neural network. The method can be used for constructing the tree diagram for any deep neural network used for natural language processing tasks in the natural language processing field so as to understand the inherent logic of the deep neural network.
It should be noted that, although the steps are described in a specific order, the steps are not necessarily performed in the specific order, and in fact, some of the steps may be performed concurrently or even in a changed order as long as the required functions are achieved.
The present invention may be a system, method and/or computer program product. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied therewith for causing a processor to implement various aspects of the present invention.
The computer readable storage medium may be a tangible device that retains and stores instructions for use by an instruction execution device. The computer readable storage medium may include, for example, but is not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, such as punch cards or in-groove projection structures having instructions stored thereon, and any suitable combination of the foregoing.
Having described embodiments of the present invention, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims (11)

1. A method for deep neural network interaction information quantification, which is used for constructing a tree diagram for quantifying inter-word interaction information modeled by a deep neural network in a natural language processing task, and is characterized by comprising the following steps:
s1, obtaining a sample from the natural language processing field data set, wherein the sample comprises a plurality of units, each unit corresponds to a word, and the units in the sample are subjected to multiple aggregation processing until the units in the sample are aggregated into a unit;
wherein each polymerization treatment comprises:
inputting a current sample into a deep neural network, and calculating a Shapril value of each unit in the current sample according to the output of the deep neural network, wherein the deep neural network is used for a natural language processing task in the field of natural language processing;
calculating the interactive gain rate between every two adjacent units based on the sand-pril value of each unit, aggregating the two adjacent units with the maximum interactive gain rate into a new unit, and forming a new current sample with other units in the current sample for the next aggregation treatment;
s2, constructing a tree diagram reflecting the inter-word interaction information of the deep neural network internal modeling according to the unit aggregation mode in the multiple aggregation processing process of the given sample in the step S1.
2. The method of claim 1, wherein the value of salpril for each cell in the current sample is a weighted average of the marginal contributions of the cell in the set of all other cells in the current sample.
3. The method for deep neural network mutual information quantification as claimed in claim 2, wherein the value of the salpril of each unit in the current sample is determined by:
Figure FDA0002545316290000011
wherein, v represents a neural network,
Figure FDA0002545316290000012
representing the ith cell a in the current sampleiThe value of Shapril, N represents the set of all units in the current sample, | N | represents the size of the set N, and S represents the unit a except the ith unit in the current sampleiOther possible combinations of units than S, where S represents the size of the set S, where factorial, v (-) represents the output of the deep neural network, and v (S ∪ { a) }iMeans of the ith unit aiMarginal contribution to set S, where v (S ∪ { a)iDenotes the addition of the i-th unit a to the set SiThe resulting outputs from the set S input neural network are denoted by v (S).
4. The method of claim 3, wherein the interaction gain ratio between two adjacent units is the ratio of the interaction gain of the two adjacent units to the total interaction information interacted with the two units.
5. The method of claim 4, wherein the interaction gain ratio between two adjacent units is determined by:
Figure FDA0002545316290000021
wherein [ S ]1]Is represented by the set S1A unit formed by polymerizing all the units in (A), (B), (C) and (C) [ S ]2]Is represented by the set S2A unit formed by polymerizing all the units in (A), (B), (C) and (C) [ S ]1]、[S2]Being two adjacent cells, Bbetween([S1],[S2]) Is two adjacent units [ S1]、[S2]Gain of interaction between, [ S ]1]Is a unit of [ S1]Units adjacent to the left side thereof before being polymerized, [ S ]2]Is a unit of [ S2]Its right-adjacent unit before being polymerized, Bbetween([S1]',[S1]) Is a unit [ S1]'、[S1]Gain of interaction between, Bbetween([S2],[S2]') is a unit [ S2]、[S2]' mutual gain between, and unit [ S1]、[S2]The related total interaction information is Bbetween([S1],[S2])、Bbetween([S1]',[S1])、Bbetween([S2],[S2]')、φ([S1])、φ([S2]) Where phi ([ S ]1])、φ([S2]) Are respectively a unit [ S1]、[S2]A value of salpril.
6. The method of claim 5, wherein the inter-gain of two neighboring units is a difference between an inter-gain of a new unit after aggregation of the two neighboring units and an inter-gain of the two neighboring units before aggregation.
7. The method of claim 6, wherein the interaction gains of two adjacent units are determined by:
Bbetween([S1],[S2])=B([S])-B([S1])-B([S2])
wherein [ S ]]Denotes a unit formed by the polymerization of all units in the set S, [ S ]1]Is represented by the set S1Wherein all units are polymerized to form a unit, [ S ]2]Is represented by the set S2Wherein all units are aggregated to form a unit, B (-) represents the interaction gain within the unit, Bbetween(. cndot.) represents the interaction gain between units.
8. The method of claim 7, wherein each unit interaction gain is determined by:
Figure FDA0002545316290000022
wherein, [ S ] represents a unit formed by aggregating all units in the set S, b is a unit in the set S, and N \ S represents a set formed by the units in the set N except the set S.
9. The method for quantizing deep neural network interaction information according to any one of claims 1 to 8, wherein the step S3 is implemented by constructing a binary tree as follows:
s31, forming a first layer leaf node of the binary tree from bottom to top by using all units in the sample;
and S32, according to the aggregation sequence, taking a new unit formed after each aggregation as a father node of two adjacent units before the aggregation until a root node of the tree is formed.
10. A computer-readable storage medium, having embodied thereon a computer program, the computer program being executable by a processor to perform the steps of the method of any one of claims 1 to 9.
11. An electronic device, comprising:
one or more processors;
storage means for storing one or more programs which, when executed by the one or more processors, cause the electronic device to carry out the steps of the method according to any one of claims 1 to 9.
CN202010558767.1A 2020-06-18 2020-06-18 Method for quantizing interactive information of deep neural network Active CN111737466B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010558767.1A CN111737466B (en) 2020-06-18 2020-06-18 Method for quantizing interactive information of deep neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010558767.1A CN111737466B (en) 2020-06-18 2020-06-18 Method for quantizing interactive information of deep neural network

Publications (2)

Publication Number Publication Date
CN111737466A true CN111737466A (en) 2020-10-02
CN111737466B CN111737466B (en) 2022-11-29

Family

ID=72649650

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010558767.1A Active CN111737466B (en) 2020-06-18 2020-06-18 Method for quantizing interactive information of deep neural network

Country Status (1)

Country Link
CN (1) CN111737466B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112200320A (en) * 2020-12-02 2021-01-08 成都数联铭品科技有限公司 Model interpretation method, system, equipment and storage medium based on cooperative game method
CN118378667A (en) * 2024-04-01 2024-07-23 佛山科学技术学院 NAS neural network design method and system based on saprolil values

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180253646A1 (en) * 2017-03-05 2018-09-06 International Business Machines Corporation Hybrid aggregation for deep learning neural networks
CN108875024A (en) * 2018-06-20 2018-11-23 清华大学深圳研究生院 File classification method, system, readable storage medium storing program for executing and electronic equipment
CN109299262A (en) * 2018-10-09 2019-02-01 中山大学 A kind of text implication relation recognition methods for merging more granular informations
CN109858032A (en) * 2019-02-14 2019-06-07 程淑玉 Merge more granularity sentences interaction natural language inference model of Attention mechanism
CN110866113A (en) * 2019-09-30 2020-03-06 浙江大学 Text classification method based on sparse self-attention mechanism fine-tuning Bert model

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180253646A1 (en) * 2017-03-05 2018-09-06 International Business Machines Corporation Hybrid aggregation for deep learning neural networks
CN108875024A (en) * 2018-06-20 2018-11-23 清华大学深圳研究生院 File classification method, system, readable storage medium storing program for executing and electronic equipment
CN109299262A (en) * 2018-10-09 2019-02-01 中山大学 A kind of text implication relation recognition methods for merging more granular informations
CN109858032A (en) * 2019-02-14 2019-06-07 程淑玉 Merge more granularity sentences interaction natural language inference model of Attention mechanism
CN110866113A (en) * 2019-09-30 2020-03-06 浙江大学 Text classification method based on sparse self-attention mechanism fine-tuning Bert model

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
张小庆等: "基于合作博弈的虚拟化资源效用分配策略", 《计算机科学》 *
程淑玉等: "融合Attention多粒度句子交互自然语言推理研究", 《小型微型计算机系统》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112200320A (en) * 2020-12-02 2021-01-08 成都数联铭品科技有限公司 Model interpretation method, system, equipment and storage medium based on cooperative game method
CN118378667A (en) * 2024-04-01 2024-07-23 佛山科学技术学院 NAS neural network design method and system based on saprolil values

Also Published As

Publication number Publication date
CN111737466B (en) 2022-11-29

Similar Documents

Publication Publication Date Title
CN110807154B (en) Recommendation method and system based on hybrid deep learning model
CN109783817B (en) Text semantic similarity calculation model based on deep reinforcement learning
Lall et al. The MIDAS touch: accurate and scalable missing-data imputation with deep learning
CN111078836B (en) Machine reading understanding method, system and device based on external knowledge enhancement
CN109992779B (en) Emotion analysis method, device, equipment and storage medium based on CNN
CN110674850A (en) Image description generation method based on attention mechanism
CN111309927B (en) Personalized learning path recommendation method and system based on knowledge graph mining
CN110046228B (en) Short text topic identification method and system
CN111125520B (en) Event line extraction method based on deep clustering model for news text
Evans Uncertainty and error
CN111737466B (en) Method for quantizing interactive information of deep neural network
CN112417289A (en) Information intelligent recommendation method based on deep clustering
WO2023045725A1 (en) Method for dataset creation, electronic device, and computer program product
Clarke Logical constraints: The limitations of QCA in social science research
Christensen et al. Factor or network model? Predictions from neural networks
CN117313160B (en) Privacy-enhanced structured data simulation generation method and system
Roussos Normative formal Epistemology as modelling
Shin et al. End-to-end task dependent recurrent entity network for goal-oriented dialog learning
CN109977194B (en) Text similarity calculation method, system, device and medium based on unsupervised learning
Yang Machine learning methods on COVID-19 situation prediction
Zhu et al. A hybrid model for nonlinear regression with missing data using quasilinear kernel
Wang et al. [Retracted] Application of Improved Machine Learning and Fuzzy Algorithm in Educational Information Technology
CN112507185B (en) User portrait determination method and device
CN110348577B (en) Knowledge tracking method based on fusion cognitive computation
CN112463964A (en) Text classification and model training method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant