WO2024077651A1 - 一种神经网络转化方法、电子设备和存储介质 - Google Patents

一种神经网络转化方法、电子设备和存储介质 Download PDF

Info

Publication number
WO2024077651A1
WO2024077651A1 PCT/CN2022/126361 CN2022126361W WO2024077651A1 WO 2024077651 A1 WO2024077651 A1 WO 2024077651A1 CN 2022126361 W CN2022126361 W CN 2022126361W WO 2024077651 A1 WO2024077651 A1 WO 2024077651A1
Authority
WO
WIPO (PCT)
Prior art keywords
neural network
decision tree
present application
layer
electronic device
Prior art date
Application number
PCT/CN2022/126361
Other languages
English (en)
French (fr)
Inventor
卡格拉•艾泰金
Original Assignee
瑞声科技(新加坡)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 瑞声科技(新加坡)有限公司 filed Critical 瑞声科技(新加坡)有限公司
Publication of WO2024077651A1 publication Critical patent/WO2024077651A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/042Knowledge-based neural networks; Logical representations of neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/01Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound

Definitions

  • the present application relates to the field of computer technology, and in particular to a neural network conversion method, electronic device and storage medium.
  • Neural networks are increasingly being used in computer technology. However, the black-box nature of neural network predictions has hampered their wider and more reliable use in many industries, such as health and safety. To this end, methods need to be built to explain neural network decisions, which is called explainable artificial intelligence (XAI). The work of explaining neural network decisions can be divided into feature maps and connecting neural networks with explainable methods.
  • XAI explainable artificial intelligence
  • Feature maps are a way to highlight regions of the input that the neural network uses when making predictions.
  • Early work [8] used the gradient of the neural network output with respect to the input to visualize the linearization of the entire network for a particular input.
  • the feature maps obtained by this method are usually noisy and do not provide a clear understanding of the decisions made.
  • Another tracking method uses the derivative of the neural network output with respect to the activation, usually before the fully connected layer.
  • the feature maps obtained by this tracking method are clearer in the sense that they highlight the areas related to the predicted class. Although useful for purposes such as checking whether the support region of a decision is reasonable, these methods still lack detailed logical reasoning for why such a decision was made.
  • the present application provides a neural network conversion method and an electronic device.
  • the present application also provides a computer-readable storage medium.
  • the present application provides a neural network conversion method, which is applied to a terminal device and includes:
  • leaf branching is performed starting from the root of the decision tree until the decision tree covers all effective filters of the neural network, wherein the neural network is a neural network with piecewise linear activation.
  • performing leaf branching from the root of the decision tree includes:
  • each leaf branch corresponds to an effective filter, and the corresponding order of the effective filters is based on the order of the effective filters in the same layer of the neural network and the order of different layers in the neural network.
  • a valid matrix is used as the decision rule.
  • a residual effective matrix is used as the decision rule.
  • the normalization layer is embedded in a linear layer after or before normalization before activation or after activation, respectively.
  • the method further includes:
  • the decision tree is losslessly pruned based on the categories achieved during training of the neural network.
  • the present application provides a neural network calculation method, which is applied to a terminal device and includes:
  • the present application provides an electronic device, comprising a memory for storing computer program instructions and a processor for executing computer program instructions, wherein when the computer program instructions are executed by the processor, the electronic device is triggered to execute the method steps described in the first aspect.
  • the present application provides an electronic device, comprising a memory for storing computer program instructions and a processor for executing computer program instructions, wherein when the computer program instructions are executed by the processor, the electronic device is triggered to execute the method steps described in the second aspect.
  • the present application provides a computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, which, when executed on a computer, enables the computer to execute the method described in the first aspect or the second aspect.
  • the neural network is converted into a decision tree, and the neural network is explained based on the decision tree, thereby solving the black box problem of the neural network.
  • the decision tree equivalent to the neural network can effectively reduce the neural network computing cost caused by the increased memory requirements.
  • FIG1 is a schematic diagram of a method flow according to an embodiment of the present application.
  • FIG2 is a schematic diagram of a decision tree according to an embodiment of the present application.
  • FIG3 is a schematic diagram of a decision tree according to an embodiment of the present application.
  • FIG4 is a schematic diagram of a decision tree according to an embodiment of the present application.
  • FIG5 is a schematic diagram showing a model response according to an embodiment of the present application.
  • FIG6 is a schematic diagram of a decision tree according to an embodiment of the present application.
  • FIG7 is a schematic diagram of a decision tree according to an embodiment of the present application.
  • FIG8 is a flowchart of a neural network calculation according to an embodiment of the present application.
  • FIG9 is a schematic diagram showing a hardware structure of an electronic device according to an embodiment of the present application.
  • this application provides a neural network conversion method.
  • a neural network with piecewise linear activation is converted into an equivalent decision tree representation.
  • the induced tree output is exactly the same as the neural network, which does not limit or require changing the neural structure in any way.
  • Each decision of the neural network is explained by a decision tree.
  • an effective filter (effective filter) of each layer of the neural network is used as a decision rule of the corresponding layer of the decision tree.
  • Fig. 1 is a schematic diagram of a method flow according to an embodiment of the present application.
  • the electronic device executes the flow shown in Fig. 1 to convert a piecewise linear activated neural network into a decision tree.
  • leaf branching is performed starting from the root of the decision tree, and the decision rule is an effective filter of the neural network.
  • leaf branches are performed for the nodes branched from the previous branch each time, and each leaf branch corresponds to an effective filter.
  • the corresponding order of the effective filters is based on the order of filters in the same layer and the order of filters in different layers.
  • the decision rule is the first effective filter of the first layer of the neural network.
  • k corresponds to the number of piecewise linear regions in the activation function used.
  • a second leaf branch is performed on each node in the result of the first leaf branch (eg, k nodes), and the decision rule is the first effective filter of the first layer of the neural network.
  • Leaf branching continues until all valid filters in the first layer of the neural network are covered.
  • the decision rule is the first effective filter of the second layer of the neural network. Continue with leaf branching until all effective filters of all layers of the neural network are covered.
  • S110 includes the following process.
  • the initial values of s, i, and m are 1, and the decision tree node at the first layer of the decision tree is the root of the decision tree.
  • Wi is the weight matrix of the i-th layer of the neural network.
  • be any piecewise linear activation function and x0 be the input of the neural network.
  • the output and intermediate features of the feedforward neural network can be expressed as follows:
  • any final activation (such as softmax) is omitted, and the bias term is ignored as it can be simply included by connecting a 1 value to each xi .
  • the activation function ⁇ is used as an element-wise scalar multiplication, so Formula 1 can be written as follows:
  • a i-1 is a vector, indicating corresponds to the activation slope in the linear region
  • a i-1 can be directly interpreted as the classification result because it includes the indicator (slope) of the linear region in the activation function.
  • Formula 2 can be reorganized as follows:
  • Equation 1 can be rewritten as follows:
  • the effective weight matrix of layer i applied directly to input x0 is defined as As follows:
  • Equation 5 It can be directly observed from Equation 5 that the effective matrix of the i-th layer depends only on the classification vector from the previous layer. This means that the calculation of the filter that will make the next classification depends only on the previous classification.
  • layer i is represented as a k mi -way classification, where mi is the number of filters in each layer i and k is the total number of linear regions in the activations. Therefore, this classification in layer i can be represented by a tree of depth mi , where nodes at any depth are branched into k classifications.
  • the effective matrix of each layer of the fully connected neural network is used as the effective filter used by S110, and the effective matrix is used as the decision rule (classification rule) of the corresponding layer of the decision tree.
  • the effective matrix of the fully connected neural network is calculated based on Formula 5 to determine the decision rule of the decision tree.
  • the following algorithm flow can be used to realize the conversion of the fully connected neural network into a decision tree.
  • Lines 4-8 in Algorithm 1 correspond to a node in the decision tree where a simple yes/no decision is made.
  • Equation 7 The effective matrix of residual neural networks can be defined as follows.
  • c is defined as the cascaded classification results from the previous layers.
  • the residual validity matrix is defined according to the classification of the previous activations Residual effective matrix As an effective filter in S110.
  • the normalization layers in the neural network are linear, and after training, they can be embedded in the linear layers before or after the normalization before or after activation, respectively. Therefore, in one embodiment, in the process of converting the neural network to the decision tree representation, no separate conversion is required for the normalization layer.
  • Ki Ci +1 ⁇ Ci ⁇ Mi ⁇ Ni be the convolution kernel of the i-th layer, applied to the input Fi : Ci ⁇ Hi ⁇ Wi .
  • ai-1 has the same spatial size as Ki and is composed of the slope of the activation function of the corresponding region in the previous feature Fi -1 .
  • Equation 11 is defined as the concatenated classification result of all relevant regions in the previous layers.
  • the effective convolution depends only on the classification from the activation, which makes the tree equivalence similar to the analysis of fully connected networks. The difference from the fully connected layer case is that many decisions of the convolutional layer are made on partial input regions rather than the entire X0 .
  • Fig. 2 is a schematic diagram of a decision tree according to an embodiment of the present application.
  • the decision tree converted from the two-layer ReLU neural network is shown in Fig. 2.
  • the depth of the equivalent decision tree converted by the neural network (the total number of layers of the decision tree) is:
  • the total number of tree branches in the last layer of the decision tree (the total number of categories in the decision tree) is 2d .
  • the total number of tree branches in the last layer of the decision tree increases exponentially. For example, if the first layer of the neural network contains 64 filters, there will be at least 264 branches in the decision tree.
  • the equivalent decision tree of the neural network is losslessly pruned according to the violation rules and redundant rules of the equivalent decision tree of the neural network.
  • the neural network has 3 dense layers, the first and second layers have two filters each, and the last layer has one filter.
  • the network uses leaky-ReLU activations after the fully connected layers, except for the last layer, which has no activations. Regression
  • Fig. 3 is a schematic diagram of a decision tree according to an embodiment of the present application.
  • each rectangular box in area 301 represents a decision rule
  • the left sub-box in the box represents that the decision rule is not valid
  • the right sub-box represents that the decision rule is valid.
  • redundant rules is to check for x ⁇ 0.32 after the x ⁇ -1.16 rule holds. This directly creates an invalid right child for the node. Therefore, in this case, the tree can be cleaned up by removing the right child box and merging the classification rules into a stricter rule: in the specific case, x ⁇ -1.16.
  • FIG. 4 is a schematic diagram of a decision tree according to an embodiment of the present application.
  • the decision tree shown in FIG. 4 includes five categories ( 401 ⁇ 405 ) instead of sixteen.
  • Fig. 5 is a schematic diagram of a model response according to an embodiment of the present application.
  • the model response of the decision tree shown in Fig. 4 is shown in Fig. 5 .
  • the neural network cannot grasp the symmetric nature of the regression problem, which is evident from the fact that the decision boundary is asymmetric.
  • the region below -1.16 and above 1 is unbounded, and therefore, the neural decision loses accuracy when x goes outside these boundaries.
  • the number of categories may be greater than the training data, it is likely that not all categories will be realized during the training of the neural network. Therefore, these categories can also be pruned based on the application.
  • the decision tree is losslessly pruned based on the categories that are realized during the training of the neural network.
  • data falling into these categories may be considered invalid if the application so permits.
  • a 3-layer fully connected neural network with leaky-ReLU activation is trained, except for the last layer which has sigmoid activation.
  • Each layer has 2 filters, except for the last layer which has 1 filter.
  • Fig. 6 is a schematic diagram of a decision tree according to an embodiment of the present application.
  • the decision tree shown in Fig. 6 is a classification tree corresponding to a certain half-moon classification neural network.
  • the cleaned decision tree resulting from the trained network is shown in Figure 6.
  • the decision tree finds many categories whose boundaries are determined by the rules in the tree, where each category is assigned a class.
  • FIG. 7 is a schematic diagram showing classification performed by a decision tree for a half-moon dataset according to an embodiment of the present application.
  • the neural network is converted into a decision tree, and the neural network is explained based on the decision tree, thereby solving the black box problem of the neural network.
  • an embodiment of the present application also proposes a neural network calculation method.
  • Fig. 8 is a flowchart of a neural network calculation according to an embodiment of the present application.
  • the electronic device executes the following process as shown in Fig. 8 to implement the neural network calculation.
  • S830 Use the first decision tree to calculate the data to be calculated and obtain a calculation result.
  • Table 1 shows the computational and memory analysis results for a game problem.
  • the number of parameters, floating point comparisons, and multiplication or addition operations of a neural network and its tree are compared. Since the induced tree is an expansion of the neural network, it covers all possible paths and keeps all possible valid filters in memory. Therefore, as expected, the number of parameters in the tree representation of the neural network is larger than that of the network. In the induced tree, in each layer i, the max mi filter is directly applied to the input, while in the neural network, the max mi filter is always applied to the previous features, which are usually much larger than the input in feature dimensions. Therefore, in terms of computation, the tree representation has an advantage over the neural network.
  • each step of the method flow can be implemented by dividing the functions into various modules.
  • the division of each module is only a division of logical functions.
  • the functions of each module can be implemented in the same or multiple software and/or hardware.
  • the device proposed in the embodiment of the present application can be fully or partially integrated into a physical entity during actual implementation, or it can be physically separated.
  • these modules can all be implemented in the form of software calling through processing elements; they can also be all implemented in the form of hardware; some modules can also be implemented in the form of software calling through processing elements, and some modules can be implemented in the form of hardware.
  • the detection module can be a separately established processing element, or it can be integrated in a chip of an electronic device.
  • the implementation of other modules is similar.
  • all or part of these modules can be integrated together, or they can be implemented independently.
  • each step of the above method or each of the above modules can be completed by an integrated logic circuit of hardware in a processor element or instructions in the form of software.
  • the above modules may be one or more integrated circuits configured to implement the above methods, such as one or more application specific integrated circuits (ASIC), or one or more digital signal processors (DSP), or one or more field programmable gate arrays (FPGA).
  • ASIC application specific integrated circuit
  • DSP digital signal processor
  • FPGA field programmable gate arrays
  • these modules may be integrated together and implemented in the form of a system-on-a-chip (SOC).
  • SOC system-on-a-chip
  • an embodiment of the present application proposes an electronic chip.
  • the electronic chip is installed on an electronic device, and the electronic chip includes:
  • a processor is used to execute computer program instructions stored in a memory, wherein when the computer program instructions are executed by the processor, the electronic chip is triggered to execute the method steps described in the embodiments of the present application.
  • An embodiment of the present application further provides an electronic device.
  • FIG. 9 is a schematic diagram showing the structure of an electronic device according to an embodiment of the present application.
  • the electronic device 900 includes a memory 910 for storing computer program instructions and a processor 920 for executing the program instructions, wherein when the computer program instructions are executed by the processor, the electronic device is triggered to execute the method steps described in the embodiments of the present application.
  • the above-mentioned one or more computer programs are stored in the above-mentioned memory, and the above-mentioned one or more computer programs include instructions.
  • the above-mentioned instructions are executed by the above-mentioned device, the above-mentioned device executes the method steps described in the embodiment of the present application.
  • the processor of the electronic device may be a SOC device on a chip, which may include a central processing unit (CPU), and may further include other types of processors.
  • the processor of the electronic device may be a PWM control chip.
  • the processor involved may include, for example, a CPU, a DSP, a microcontroller or a digital signal processor, and may also include a GPU, an embedded neural network processor (Neural-network Process Units, NPU) and an image signal processor (Image Signal Processing, ISP).
  • the processor may also include necessary hardware accelerators or logic processing hardware circuits, such as ASIC, or one or more integrated circuits for controlling the execution of the program of the technical solution of the present application.
  • the processor may have the function of operating one or more software programs, and the software programs may be stored in a storage medium.
  • the memory of the electronic device can be a read-only memory (ROM), other types of static storage devices that can store static information and instructions, a random access memory (RAM) or other types of dynamic storage devices that can store information and instructions, or it can be an electrically erasable programmable read-only memory (EEPROM), a compact disc read-only memory (CD-ROM) or other optical disc storage, optical disc storage (including compressed optical disc, laser disc, optical disc, digital versatile disc, Blu-ray disc, etc.), a magnetic disk storage medium or other magnetic storage device, or it can also be any computer-readable medium that can be used to carry or store the desired program code in the form of instructions or data structures and can be accessed by a computer.
  • ROM read-only memory
  • RAM random access memory
  • EEPROM electrically erasable programmable read-only memory
  • CD-ROM compact disc read-only memory
  • optical disc storage including compressed optical disc, laser disc, optical disc, digital versatile disc, Blu-ray disc, etc.
  • magnetic disk storage medium or other magnetic storage device
  • the processor and the memory can be combined into a processing device, more commonly, they are independent components, and the processor is used to execute the program code stored in the memory to implement the method described in the embodiment of the present application.
  • the memory can also be integrated into the processor, or independent of the processor.
  • the devices, apparatuses, and modules described in the embodiments of the present application may be implemented by computer chips or entities, or by products having certain functions.
  • the embodiments of the present application may be provided as methods, devices, or computer program products. Therefore, the present invention may take the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, the present invention may take the form of a computer program product implemented on one or more computer-usable storage media containing computer-usable program code.
  • any function if implemented in the form of a software functional unit and sold or used as an independent product, can be stored in a computer-readable storage medium.
  • the technical solution of the present application or the part that contributes to the prior art, or the part of the technical solution, can be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for enabling a computer device (which can be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method described in each embodiment of the present application.
  • a computer-readable storage medium is further provided, in which a computer program is stored.
  • the computer-readable storage medium is run on a computer, the computer executes the method provided in the embodiment of the present application.
  • An embodiment of the present application further provides a computer program product, which includes a computer program.
  • a computer program product which includes a computer program.
  • the computer program When the computer program is run on a computer, the computer executes the method provided by the embodiment of the present application.
  • each flow process and/or box in the flow chart and/or block diagram and the combination of the flow chart and/or box in the flow chart and/or block diagram can be realized by computer program instructions.
  • These computer program instructions can be provided to a processor of a general-purpose computer, a special-purpose computer, an embedded processing machine or other programmable data processing device to produce a machine, so that the instructions executed by the processor of the computer or other programmable data processing device produce a device for realizing the function specified in one flow chart or multiple flows and/or one box or multiple boxes of the block diagram.
  • These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing device to work in a specific manner, so that the instructions stored in the computer-readable memory produce a manufactured product including an instruction device that implements the functions specified in one or more processes in the flowchart and/or one or more boxes in the block diagram.
  • These computer program instructions may also be loaded onto a computer or other programmable data processing device so that a series of operational steps are executed on the computer or other programmable device to produce a computer-implemented process, whereby the instructions executed on the computer or other programmable device provide steps for implementing the functions specified in one or more processes in the flowchart and/or one or more boxes in the block diagram.
  • At least one refers to one or more, and “more than one” refers to two or more.
  • “And/or” describes the association relationship of associated objects, indicating that three relationships may exist.
  • a and/or B can represent the existence of A alone, the existence of A and B at the same time, and the existence of B alone. Among them, A and B can be singular or plural.
  • the character “/” generally indicates that the previous and subsequent associated objects are in an “or” relationship.
  • At least one of the following” and similar expressions refer to any combination of these items, including any combination of single or plural items.
  • At least one of a, b and c can be represented by: a, b, c, a and b, a and c, b and c, or a and b and c, where a, b, c can be single or multiple.
  • the terms “include”, “comprises” or any other variants thereof are intended to cover non-exclusive inclusion, so that a process, method, commodity or device including a series of elements includes not only those elements, but also includes other elements not explicitly listed, or also includes elements inherent to such process, method, commodity or device.
  • an element defined by the sentence “includes a " does not exclude the presence of other identical elements in the process, method, commodity or device including the element.
  • the present application may be described in the general context of computer-executable instructions executed by a computer, such as program modules.
  • program modules include routines, programs, objects, components, data structures, etc. that perform specific tasks or implement specific abstract data types.
  • the present application may also be practiced in distributed computing environments where tasks are performed by remote processing devices connected through a communication network.
  • program modules may be located in local and remote computer storage media, including storage devices.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Medical Treatment And Welfare Office Work (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Image Analysis (AREA)

Abstract

本申请实施例提供一种神经网络转化方法、电子设备以及存储介质。所述方法应用于终端设备,包括:初始化决策树,设置所述决策树的根;以神经网络的有效滤波器作为决策规则,从所述决策树的根开始进行叶分支,直到所述决策树覆盖所述神经网络的所有有效滤波器,其中,所述神经网络为具有分段线性激活的神经网络。根据本申请实施例的方法,将神经网络转化为决策树,基于决策树解释神经网络,从而解决神经网络的黑盒问题。

Description

一种神经网络转化方法、电子设备和存储介质 技术领域
本申请涉及计算机技术领域,特别涉及一种神经网络转化方法、电子设备和存储介质。
背景技术
在计算机技术领域,神经网路的应用越来越广泛。但是,神经网络预测的黑盒性质阻碍了其在许多行业中更广泛和更可靠的应用,例如健康和安全领域。为此,需要构建解释神经网络决策的方法,其被称为可解释人工智能(XAI)。解释神经网络决策的成果可以分为特征图和将神经网络与可解释方法相连接。
特征图是突出显示输入区域的方法,神经网络在预测时利用这些区域。早期的工作[8]采用神经网络输出相对于输入的梯度,以可视化整个网络的特定输入线性化。通过该方法获得的特征图通常是有噪声,并无法清楚地理解所做出的决策。
另一种跟踪方法利用神经网络输出相对于激活的导数,通常是在完全连接层之前。由此追踪方法获得的特征图在突出显示与预测类相关的区域的意义上更清晰。尽管对于诸如检查决策的支持区域是否合理的目的是有用的,但是这些方法仍然缺乏为什么做出这种决策的详细逻辑推理。
因此,需要一种解释神经网络决策的方法。
发明内容
针对现有技术下如何解释神经网络决策的问题,本申请提供了一种神经网络转化方法、电子设备,本申请还提供一种计算机可读存储介质。
本申请实施例采用下述技术方案:
第一方面,本申请提供一种神经网络转化方法,所述方法应用于终端设备,包括:
初始化决策树,设置所述决策树的根;
以神经网络的有效滤波器作为决策规则,从所述决策树的根开始进行叶分支,直到所述决策树覆盖所述神经网络的所有有效滤波器,其中,所述神经网络为具有分段线性激活的神经网络。
在第一方面的一种实现方式中,所述从所述决策树的根开始进行叶分支,包括:
每次针对上一次分支出的节点进行所述叶分支,每次所述叶分支对应一个有效滤波器,所述有效滤波器的对应次序按照,所述神经网络中同一层中有效滤波器的排序以及所述神经网络中不同层的排序。
在第一方面的一种实现方式中,针对全连接层,以有效矩阵作为所述决策规则。
在第一方面的一种实现方式中,针对跳转连接层,以残差有效矩阵作为所述决策规则。
在第一方面的一种实现方式中,针对归一化层,将所述归一化层分别嵌入到激活前或激活后归一化之后或之前的线性层中。
在第一方面的一种实现方式中,针对卷积层,以有效卷积作为所述决策规则。
在第一方面的一种实现方式中,所述方法还包括:
根据所述决策树的违反规则和/或冗余规则,对所述决策树进行无损修剪;
和/或,
根据训练所述神经网络期间实现的类别,对所述决策树进行无损修剪。
第二方面,本申请提供一种神经网络计算方法,所述方法应用于终端设备,包括:
获取用于计算的神经网络;
基于第一方面所述的方法,将所述神经网络转化为决策树;
使用所述决策树进行计算。
第三方面,本申请提供一种电子设备,所述电子设备包括用于存储计算机程序指令的存储器和用于执行计算机程序指令的处理器,其中,当所述计算机程序指令被该处理器执行时,触发所述电子设备执行第一方面所述的方法步骤。
第四方面,本申请提供一种电子设备,所述电子设备包括用于存储计算机程序指令的存储器和用于执行计算机程序指令的处理器,其中,当所述计算机程序指令被该处理器执行时,触发所述电子设备执行第二方面所述的方法步骤。
第五方面,本申请提供一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机程序,当其在计算机上运行时,使得计算机执行第一方面或第二方面所述的方法。
根据本申请实施例所提出的上述技术方案,至少可以实现下述技术效果:
根据本申请实施例的方法,将神经网络转化为决策树,基于决策树解释神经网络,从而解决神经网络的黑盒问题。并且,与神经网络等价的决策树能够有效减少由于增加的内存要求而带来的神经网络计算成本。
附图说明
图1所示为根据本申请一实施例的方法流程示意图;
图2所示为根据本申请一实施例的决策树示意图;
图3所示为根据本申请一实施例的决策树示意图;
图4所示为根据本申请一实施例的决策树示意图;
图5所示为根据本申请一实施例的模型响应示意图;
图6所示为根据本申请一实施例的决策树示意图;
图7所示为根据本申请一实施例的决策树示意图;
图8所示为根据本申请一实施例的神经网络计算流程图;
图9所示为根据本申请一实施例的电子设备的硬件结构示意图。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚,下面将结合本申请具体实施例及相应的附图对本申请技术方案进行清楚、完整地描述。显然,所描述的实施例仅是本申请 一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
本申请的实施方式部分使用的术语仅用于对本申请的具体实施例进行解释,而非旨在限定本申请。
针对现有技术下如何解释神经网络决策的问题,本申请提供了一种神经网络转化方法。将具有分段线性激活的神经网络转化为等价的决策树表示。诱导树输出与神经网络完全相同,它不限制或不需要以任何方式改变神经结构。通过决策树解释神经网络的每个决策。
在本申请一实施例中,将神经网络转化为决策树的方法流程中,以神经网络的每层的有效的滤波器(有效滤波器)用作决策树对应层的决策规则。
具体的,图1所示为根据本申请一实施例的方法流程示意图。电子设备执行如图1所示的流程以实现将分段线性激活的神经网络转化为决策树。
S100,初始化决策树,设置决策树的根。
S110,从决策树的根开始进行叶分支,决策规则是神经网络的有效滤波器。
具体的,在S110中,从决策树的根分支出的节点开始,每次针对上一次分支出的节点进行叶分支,每次叶分支对应一个有效滤波器。有效滤波器的对应次序按照同一层中滤波器的排序以及不同层的排序。
例如,从决策树的根开始进行第一次叶分支(例如,分支到k个节点),决策规则是神经网络的第一层的第一个有效滤波器。
具体地,k对应于使用的激活函数中分段线性区域的数量。
对第一次叶分支的结果(例如,k个节点)中的每一个节点进行第二次叶分支,决策规则是神经网络的第一层的第一个有效滤波器。
持续进行叶分支,直到神经网络的第一层的所有有效滤波器被覆盖。
继续进行叶分支,决策规则是神经网络的第二层的第一个有效滤波器。持续进行叶分支,直到神经网络的所有层的所有的有效滤波器被覆盖。
具体的,在一实施例中,S110包括下述流程。
S111,对决策树的第s层的每一个决策树节点进行叶分支,获取决策树的第s+1层的决策树节点,叶分支的决策规则为神经网络第i层的第m个有效滤波器。
在S111中,s、i、m的初始值为1,决策树的第1层的决策树节点即为决策树的根。
S112,判断神经网络第i层的有效滤波器是否被全部覆盖。
如果神经网络第i层的有效滤波器被全部覆盖,执行S113。
S113,判断神经网络的所有层是否被全部覆盖。
如果神经网络的所有层被全部覆盖,执行S114。
S114,返回决策树,叶分支结束。
如果神经网络的所有层没有被全部覆盖,执行S115。
S115,i的值加1,m的值置为1。
如果神经网络第i层的有效滤波器没有被全部覆盖,执行S116。
S116,m的值加1。
在S115或者S116之后,执行S117。
S117,s的值加1。
S117之后,返回S111。
以下分别针对不同结构的分段线性激活的神经网络,描述神经网络转化为等价的决策树表示的流程。
(一)全连接网络(全连接层)
假设W i是神经网络的第i层的权重矩阵。令σ为任意分段线性激活函数,且x 0为神经网络的输入。然后,前馈神经网络的输出和中间特征可以表示如下:
Figure PCTCN2022126361-appb-000001
注意,在公式1中,省略任何最终激活(如softmax),并且忽略偏差项,因为其可以通过将1值连接到每个x i来简单地包括。激活函数σ用作逐元素标量乘法,因此公式1可以写入以下内容:
Figure PCTCN2022126361-appb-000002
在公式2中:
a i-1是一个矢量,表示
Figure PCTCN2022126361-appb-000003
对应线性区域中的激活斜率;
Figure PCTCN2022126361-appb-000004
表示逐元素相乘。
注意,a i-1可以直接解释为分类结果,因为它包括激活函数中的线性区域的指标(斜率)。
公式2可以重新组织如下:
Figure PCTCN2022126361-appb-000005
在公式3中,使用对
Figure PCTCN2022126361-appb-000006
作为W i的逐列元素相乘。这对应于通过重复a i-1列向量以匹配W i的大小而获得的矩阵的逐元素相乘。
使用公式3,公式1可以重写如下:
Figure PCTCN2022126361-appb-000007
根据公式4,定义直接应用于输入x 0的层i的有效权重矩阵
Figure PCTCN2022126361-appb-000008
如下所示:
Figure PCTCN2022126361-appb-000009
在公式5中,分类层直到第i层被定义为:c i-1=a 0||a 1||...a i-1,||是级联运算符。
根据公式5可以直接观察到,第i层的有效矩阵仅取决于来自先前层的分类向量。这表示将进行下一分类的过滤器的计算仅取决于以前的分类。
这直接表明,全连接神经网络可以表示为单个决策树,其中有效矩阵用作分类规则。在每个层i中,有效矩阵
Figure PCTCN2022126361-appb-000010
的响应被分类为a i向量,并且基于该分类结果,确定下一层的有效矩阵
Figure PCTCN2022126361-appb-000011
因此,层i被表示为k mi路分类,其中m i是每个层i中的滤波器的数量,并且k是激活中的线性区域的总数。因此,层i中的这种分类可以由深度为m i的树表示,其中任何深 度的节点被分支为k个分类。
综上,在根据本申请一实施例,将全连接神经网络转化为决策树的方法流程中,以全连接神经网络的每层的有效矩阵为S110所使用的有效滤波器,以有效矩阵用作决策树对应层的决策规则(分类规则)。具体的,基于公式5计算全连接神经网络的有效矩阵,从而确定决策树的决策规则。
进一步的,在一实施例中,以修正线性单元(Rectified Linear Unit,ReLU)为激活函数,针对ReLU神经网络,可以采用下述算法流程实现全连接神经网络转化为决策树。
算法1:
Figure PCTCN2022126361-appb-000012
在算法1中,a∈0,1。
算法1中的第4-8行对应于决策树中的一个节点,在该节点中,做出简单的是/否决策。
(二)跳转连接
以下述类型的残差神经网络(residual neural network)为例:
Figure PCTCN2022126361-appb-000013
使用公式6,参照公式1-5,可以将 rx i重写如下:
Figure PCTCN2022126361-appb-000014
使用公式7中的
Figure PCTCN2022126361-appb-000015
可以如下定义残差神经网络(residual neural networks)的有效矩阵。
Figure PCTCN2022126361-appb-000016
在公式8中,c被定义为来自先前层的级联分类结果。对于第i层,根据先前激活的分类来定义残差有效矩阵
Figure PCTCN2022126361-appb-000017
以残差有效矩阵
Figure PCTCN2022126361-appb-000018
作为S110中的有效滤波器。
(三)归一化层
由于神经网络中的归一化层是线性的,并且在训练之后,它们可以分别嵌入到激活前或激活后归一化之后或之前的线性层中。因此,在一实施例中,在将神经网络转化为决策树表示的过程中,对于归一化层不需要单独的转化。
(四)卷积神经网络(卷积层)
令K i:C i+1×C i×M i×N i为第i层的卷积核,应用于输入F i:C i×H i×W i
将卷积神经网络CNN(F 0)的输出和中间特征F i写入如下:
Figure PCTCN2022126361-appb-000019
参照公式1-5,基于激活函数的逐元素式标量乘法性质,可以编写如下内容:
Figure PCTCN2022126361-appb-000020
在公式10中,a i-1与K i的空间大小相同,并且由先前特征F i-1中相应区域的激活函数斜率组成。
注意,上述仅适用于特定空间区域,并且存在单独的a i-1表示卷积K i-1适用于每个空间区域。例如,如果K i-1是一个3×3内核,存在一个单独的a i-1表示应用卷积的所有3×3区域。
有效卷积
Figure PCTCN2022126361-appb-000021
可以写成如下:
Figure PCTCN2022126361-appb-000022
以有效卷积
Figure PCTCN2022126361-appb-000023
作为S110中的有效滤波器。
注意,在公式11中,
Figure PCTCN2022126361-appb-000024
包含每个区域的特定有效卷积,其中一个区域是根据第i层的感受野定义的。
c定义为之前各层所有相关区域的级联分类结果。在公式11中,有效卷积仅取决于来自激活的分类,这使得树等等价能够类似于对全连接网络的分析。与全连接层情况的区别在于,卷积层的许多决策是在部分输入区域而不是整个X 0上做出的。
进一步的,图2所示为根据本申请一实施例的决策树示意图。二层ReLU神经网络转化成的决策树如图2所示。
参照图1所示流程以及图2所示决策树,根据本申请实施例的方法,由神经网络转化成的等效决策树的深度(决策树的总层数)是:
Figure PCTCN2022126361-appb-000025
决策树最后一层的树分支总数(决策树的类别总数)为2 d。随着神经网络的有效滤波器数量的增加,决策树的最后一层的树分支总数呈指数型增长。例如,如果神经网络的第 一层包含64个滤波器,则在决策树中将存在至少2 64个分支。
为了控制决策树的类别总数,在一实施例中,根据神经网络的等价决策树的违反规则以及冗余规则,对神经网络的等价决策树进行无损修剪。
例如,将神经网络拟合到:y=x 2方程。神经网络具有3个密基层,第一层以及第二层分别有两个滤波器,最后一层有一个滤波器。网络在全连接层之后使用leaky-ReLU激活,除了最后一层没有激活。回归
图3所示为根据本申请一实施例的决策树示意图。y=x 2回归神经网络转化成的决策树如图3所示。
在决策树中,301区域中的每个矩形框表示一个决策规则,框中的左子框表示决策规则不成立,右子框表示决策规则成立。
假设,决策规则是通过将W Tx+β>0转换为作用于x的直接不等式来获得的。因为x是一个标量,因此决策规则可以用于特定回归y=x 2来完成。在每个叶子中,网络基于到目前为止的决定应用由302区域中的矩形表示的线性函数。
在如图3所示的决策树中,存在
Figure PCTCN2022126361-appb-000026
个分类。
由于决策树中的许多规则是冗余的,因此决策树中的一些路径变得无效。
例如,针对冗余规则,一种实现方式是,在x<-1.16规则成立之后检查x<0.32。这直接为该节点创建了无效的右子节点。因此,在这种情况下,可以通过移除右子框并将分类规则合并到更严格的规则来清理树:在特定情况下,x<-1.16。
图4所示为根据本申请一实施例的决策树示意图。
通过清理图3所示的决策树,可以获得图4所示的决策树,图4所示的决策树包括5个类别(401~405),而不是16个。
图5所示为根据本申请一实施例的模型响应示意图。图4所示的决策树的模型响应如图5所示。
基于图4所示的决策树,神经网络的解释为:对于边界通过决策树表示确定的每个区域,网络通过线性方程近似非线性y=x 2方程。
可以清楚地解释并从决策树中进行推断,其中一些如下。神经网络不能掌握回归问题的对称性质,这从决策边界不对称的事实中显而易见。在-1.16以下和1以上的区域是无界的,因此,当x超出这些边界时,神经决策失去准确性。
进一步的,由于类别的数量可能比训练数据多,而在训练神经网络期间很可能不会实现所有类别。因此,这些类别也可以基于应用来修剪。根据训练神经网络期间实现的类别,对决策树进行无损修剪。
具体的,如果应用允许,属于这些类别的数据可以视为无效。
例如,针对半月分类问题。除了最后一层具有sigmoid激活外,训练具有leaky-ReLU激活的3层全连接的神经网络。除了最后一层具有1个滤波器之外,每层具有2个滤波器。
图6所示为根据本申请一实施例的决策树示意图。图6所示的决策树为对应某一半月分类神经网络的分类树。
如图6所示,由训练的网络引起的清理决策树。决策树找到其边界由树中的规则确定 的许多类别,其中每个类别都分配了一个类。
图7所示为根据本申请一实施例的由用于半月数据集的决策树进行的分类示意图。
在图7中,用不同的灰度说明不同的类别。可以从决策树进行一些推断,例如一些区域非常明确,并且它们进行的分类完全与训练数据完全一致,因此,使这些区域非常可靠。存在无界类别,其帮助获得准确的分类边界,但未能提供训练数据的紧凑表示,这些可能与神经决策所做出的不准确推断相对应。也出现了一些类别,尽管没有任何训练数据属于它们。
根据本申请实施例的方法,将神经网络转化为决策树,基于决策树解释神经网络,从而解决神经网络的黑盒问题。
进一步的,基于本申请实施例提出的神经网络转化方法,本申请一实施例还提出了一种神经网络计算方法。
图8所示为根据本申请一实施例的神经网络计算流程图。电子设备执行如图8所示的下述流程以实现神经网络计算。
S800,获取待计算数据。
S810,获取用于计算待计算数据的第一神经网络。
S820,基于本申请实施例提出的神经网络转化方法,将第一神经网络转化为第一决策树。
S830,使用第一决策树,对待计算数据进行计算,获取计算结果。
相较于神经网络,决策树表示提供一些计算优势。
表1所示为某游戏问题的计算和内存分析结果数据。
表1
Figure PCTCN2022126361-appb-000027
在表1中,比较神经网络及其树的参数、浮点比较和乘法或加法运算的数量。由于诱导树是神经网络的展开,因此它覆盖所有可能的路径并将所有可能的有效滤波器保持在存储器中。因此,如所预期的那样,神经网络的树表示中的参数数量大于网络的参数数量。在诱导树中,在每个层i中,最大m i滤波器直接应用于输入,而在神经网络中,最大m i滤波器总是应用于对先前的特征,其通常比特征维度中的输入大得多。因此,在计算方面,树表示比神经网络更有优势。
在本申请实施例中,方法流程的各步骤可以以功能分为各种模块实现,各个模块的划分仅仅是一种逻辑功能的划分,在实施本申请实施例时可以把各模块的功能在同一个或多个软件和/或硬件中实现。
具体的,本申请实施例所提出的装置在实际实现时可以全部或部分集成到一个物理实体上,也可以物理上分开。且这些模块可以全部以软件通过处理元件调用的形式实现; 也可以全部以硬件的形式实现;还可以部分模块以软件通过处理元件调用的形式实现,部分模块通过硬件的形式实现。例如,检测模块可以为单独设立的处理元件,也可以集成在电子设备的某一个芯片中实现。其它模块的实现与之类似。此外这些模块全部或部分可以集成在一起,也可以独立实现。在实现过程中,上述方法的各步骤或以上各个模块可以通过处理器元件中的硬件的集成逻辑电路或者软件形式的指令完成。
例如,以上这些模块可以是被配置成实施以上方法的一个或多个集成电路,例如:一个或多个特定集成电路(Application Specific Integrated Circuit,ASIC),或,一个或多个数字信号处理器(Digital Singnal Processor,DSP),或,一个或者多个现场可编程门阵列(Field Programmable Gate Array,FPGA)等。再如,这些模块可以集成在一起,以片上装置(System-On-a-Chip,SOC)的形式实现。
在实际应用场景中,本说明书所示实施例的方法流程可以由安装在电子设备上的电子芯片所实现。因此,本申请一实施例提出了一种电子芯片。例如,电子芯片安装在电子设备上,电子芯片包括:
处理器,其用于执行存储在存储器上的计算机程序指令,其中,当该计算机程序指令被该处理器执行时,触发电子芯片执行本申请实施例所述的方法步骤。
本申请一实施例还提出了一种电子设备。
图9所示为根据本申请一实施例的电子设备结构示意图。
电子设备900包括用于存储计算机程序指令的存储器910和用于执行程序指令的处理器920,其中,当该计算机程序指令被该处理器执行时,触发电子设备执行如本申请实施例所述的方法步骤。
具体的,在本申请一实施例中,上述一个或多个计算机程序被存储在上述存储器中,上述一个或多个计算机程序包括指令,当上述指令被上述设备执行时,使得上述设备执行本申请实施例所述的方法步骤。
具体的,在本申请一实施例中,电子设备的处理器可以是片上装置SOC,该处理器中可以包括中央处理器(Central Processing Unit,CPU),还可以进一步包括其他类型的处理器。具体的,在本申请一实施例中,电子设备的处理器可以是PWM控制芯片。
具体的,在本申请一实施例中,涉及的处理器可以例如包括CPU、DSP、微控制器或数字信号处理器,还可包括GPU、嵌入式神经网络处理器(Neural-network Process Units,NPU)和图像信号处理器(Image Signal Processing,ISP),该处理器还可包括必要的硬件加速器或逻辑处理硬件电路,如ASIC,或一个或多个用于控制本申请技术方案程序执行的集成电路等。此外,处理器可以具有操作一个或多个软件程序的功能,软件程序可以存储在存储介质中。
具体的,在本申请一实施例中,电子设备的存储器可以是只读存储器(read-only memory,ROM)、可存储静态信息和指令的其它类型的静态存储设备、随机存取存储器(random access memory,RAM)或可存储信息和指令的其它类型的动态存储设备,也可以是电可擦可编程只读存储器(electrically erasable programmable read-only memory,EEPROM)、只读光盘(compact disc read-only memory,CD-ROM)或其他光盘存储、光碟存储(包括压缩光碟、激光碟、光碟、数字通用光碟、蓝光光碟等)、磁盘存储介质或者其它磁存储设备,或者还可以是能够用于携带或存储具有指令或数据结构形式的期望的 程序代码并能够由计算机存取的任何计算机可读介质。
具体的,在本申请一实施例中,处理器可以和存储器可以合成一个处理装置,更常见的是彼此独立的部件,处理器用于执行存储器中存储的程序代码来实现本申请实施例所述方法。具体实现时,该存储器也可以集成在处理器中,或者,独立于处理器。
进一步的,本申请实施例阐明的设备、装置、模块,具体可以由计算机芯片或实体实现,或者由具有某种功能的产品来实现。
本领域内的技术人员应明白,本申请实施例可提供为方法、装置、或计算机程序产品。因此,本发明可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质上实施的计算机程序产品的形式。
在本申请所提供的几个实施例中,任一功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。
具体的,本申请一实施例中还提供一种计算机可读存储介质,该计算机可读存储介质中存储有计算机程序,当其在计算机上运行时,使得计算机执行本申请实施例提供的方法。
本申请一实施例还提供一种计算机程序产品,该计算机程序产品包括计算机程序,当其在计算机上运行时,使得计算机执行本申请实施例提供的方法。
本申请中的实施例描述是参照根据本申请实施例的方法、设备(装置)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
还需要说明的是,本申请实施例中,“至少一个”是指一个或者多个,“多个”是指两个或两个以上。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示单独存在A、同时存在A和B、单独存在B的情况。其中A,B可以是单数或者复数。字符“/”一般表示前后关联对象是一种“或”的关系。“以下至少一项” 及其类似表达,是指的这些项中的任意组合,包括单项或复数项的任意组合。例如,a,b和c中的至少一项可以表示:a,b,c,a和b,a和c,b和c或a和b和c,其中a,b,c可以是单个,也可以是多个。
本申请实施例中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、商品或者设备中还存在另外的相同要素。
本申请可以在由计算机执行的计算机可执行指令的一般上下文中描述,例如程序模块。一般地,程序模块包括执行特定任务或实现特定抽象数据类型的例程、程序、对象、组件、数据结构等等。也可以在分布式计算环境中实践本申请,在这些分布式计算环境中,由通过通信网络而被连接的远程处理设备来执行任务。在分布式计算环境中,程序模块可以位于包括存储设备在内的本地和远程计算机存储介质中。
本申请中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于装置实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
本领域普通技术人员可以意识到,本申请实施例中描述的各单元及算法步骤,能够以电子硬件、计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的装置、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
以上所述,仅为本申请的具体实施方式,任何熟悉本技术领域的技术人员在本申请公开的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。本申请的保护范围应以所述权利要求的保护范围为准。

Claims (11)

  1. 一种神经网络转化方法,其特征在于,所述方法应用于终端设备,包括:
    初始化决策树,设置所述决策树的根;
    以神经网络的有效滤波器作为决策规则,从所述决策树的根开始进行叶分支,直到所述决策树覆盖所述神经网络的所有有效滤波器,其中,所述神经网络为具有分段线性激活的神经网络。
  2. 根据权利要求1所述的方法,其特征在于,所述从所述决策树的根开始进行叶分支,包括:
    每次针对上一次分支出的节点进行所述叶分支,每次所述叶分支对应一个有效滤波器,所述有效滤波器的对应次序按照,所述神经网络中同一层中有效滤波器的排序以及所述神经网络中不同层的排序。
  3. 根据权利要求1所述的方法,其特征在于,针对全连接层,以有效矩阵作为所述决策规则。
  4. 根据权利要求1所述的方法,其特征在于,针对跳转连接层,以残差有效矩阵作为所述决策规则。
  5. 根据权利要求1所述的方法,其特征在于,针对归一化层,将所述归一化层分别嵌入到激活前或激活后归一化之后或之前的线性层中。
  6. 根据权利要求1所述的方法,其特征在于,针对卷积层,以有效卷积作为所述决策规则。
  7. 根据权利要求1-6所述的方法,其特征在于,所述方法还包括:
    根据所述决策树的违反规则和/或冗余规则,对所述决策树进行无损修剪;
    和/或,
    根据训练所述神经网络期间实现的类别,对所述决策树进行无损修剪。
  8. 一种神经网络计算方法,其特征在于,所述方法应用于终端设备,包括:
    获取用于计算的神经网络;
    基于权利要求1-7中任一项所述的方法,将所述神经网络转化为决策树;
    使用所述决策树进行计算。
  9. 一种电子设备,其特征在于,所述电子设备包括用于存储计算机程序指令的存储器和用于执行计算机程序指令的处理器,其中,当所述计算机程序指令被该处理器执行时,触发所述电子设备执行如权利要求1-7中任一项所述的方法步骤。
  10. 一种电子设备,其特征在于,所述电子设备包括用于存储计算机程序指令的存储器和用于执行计算机程序指令的处理器,其中,当所述计算机程序指令被该处理器执行时,触发所述电子设备执行如权利要求8所述的方法步骤。
  11. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有计算机程序,当其在计算机上运行时,使得计算机执行如权利要求1-8中任一项所述的方法。
PCT/CN2022/126361 2022-10-10 2022-10-20 一种神经网络转化方法、电子设备和存储介质 WO2024077651A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US17/962,559 2022-10-10
US17/962,559 US20240119288A1 (en) 2022-10-10 2022-10-10 Method for converting neural network, electronic device and storage medium

Publications (1)

Publication Number Publication Date
WO2024077651A1 true WO2024077651A1 (zh) 2024-04-18

Family

ID=86780279

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/126361 WO2024077651A1 (zh) 2022-10-10 2022-10-20 一种神经网络转化方法、电子设备和存储介质

Country Status (4)

Country Link
US (1) US20240119288A1 (zh)
JP (1) JP7375250B1 (zh)
CN (1) CN116306876A (zh)
WO (1) WO2024077651A1 (zh)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200005148A1 (en) * 2018-06-29 2020-01-02 Microsoft Technology Licensing, Llc Neural trees
CN110784760A (zh) * 2019-09-16 2020-02-11 清华大学 一种视频播放方法、视频播放器及计算机存储介质
CN111898692A (zh) * 2020-08-05 2020-11-06 清华大学 神经网络到决策树的转换方法、存储介质及电子设备
CN113489751A (zh) * 2021-09-07 2021-10-08 浙江大学 一种基于深度学习的网络流量过滤规则转化方法

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210019635A1 (en) 2019-07-15 2021-01-21 Ramot At Tel Aviv University Group specific decision tree

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200005148A1 (en) * 2018-06-29 2020-01-02 Microsoft Technology Licensing, Llc Neural trees
CN110784760A (zh) * 2019-09-16 2020-02-11 清华大学 一种视频播放方法、视频播放器及计算机存储介质
CN111898692A (zh) * 2020-08-05 2020-11-06 清华大学 神经网络到决策树的转换方法、存储介质及电子设备
CN113489751A (zh) * 2021-09-07 2021-10-08 浙江大学 一种基于深度学习的网络流量过滤规则转化方法

Also Published As

Publication number Publication date
JP2024056120A (ja) 2024-04-22
CN116306876A (zh) 2023-06-23
US20240119288A1 (en) 2024-04-11
JP7375250B1 (ja) 2023-11-07

Similar Documents

Publication Publication Date Title
TWI584206B (zh) Inference Device and Inference Method
Nandedkar et al. A fuzzy min-max neural network classifier with compensatory neuron architecture
Abbasbandy et al. Numerical solutions of fuzzy differential equations by Taylor method
CN108228728B (zh) 一种参数化的论文网络节点表示学习方法
CN108537328A (zh) 用于可视化构建神经网络的方法
Scardapane et al. Distributed training of graph convolutional networks
Khammar et al. A robust least squares fuzzy regression model based on kernel function
Melin et al. Fuzzy logic in intelligent system design: Theory and applications
WO2024077651A1 (zh) 一种神经网络转化方法、电子设备和存储介质
US11669727B2 (en) Information processing device, neural network design method, and recording medium
Kimmig et al. Algebraic model counting
KR20220074430A (ko) 뉴로 심볼릭 기반 릴레이션 임베딩을 통한 지식완성 방법 및 장치
Dikopoulou et al. From undirected structures to directed graphical lasso fuzzy cognitive maps using ranking-based approaches
Abdellatif A comparison study between soft computing and statistical regression techniques for software effort estimation
Hsieh et al. The Hamiltonian problem on distance-hereditary graphs
Yu et al. Graph isomorphism identification based on link-assortment adjacency matrix
Yin A characteristic-point-based fuzzy inference system aimed to minimize the number of fuzzy rules
JP7462140B2 (ja) ニューラルネットワーク回路及びニューラルネットワーク演算方法
WO2019185037A1 (zh) 用于处理数据集的方法、系统和存储介质
KR102557800B1 (ko) 차분 프라이버시 기반 의사결정 트리 생성 방법 및 장치
Ciftcioglu et al. A fuzzy neural tree based on likelihood
Remesan et al. Hydroinformatics and data-based modelling issues in hydrology
Adewole et al. The quadratic entropy approach to implement the Id3 decision tree algorithm
JP7187065B1 (ja) 計算手法決定システム、計算手法決定方法、及び、計算手法決定プログラム
CN115204372B (zh) 一种基于项游走图神经网络的前提选择方法及系统