CN115730646A - Hybrid expert network optimization method based on partial quantization - Google Patents

Hybrid expert network optimization method based on partial quantization Download PDF

Info

Publication number
CN115730646A
CN115730646A CN202211713009.8A CN202211713009A CN115730646A CN 115730646 A CN115730646 A CN 115730646A CN 202211713009 A CN202211713009 A CN 202211713009A CN 115730646 A CN115730646 A CN 115730646A
Authority
CN
China
Prior art keywords
network
quantization
data
subnet
subnets
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211713009.8A
Other languages
Chinese (zh)
Inventor
赵继胜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Fudian Intelligent Technology Co ltd
Original Assignee
Shanghai Fudian Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Fudian Intelligent Technology Co ltd filed Critical Shanghai Fudian Intelligent Technology Co ltd
Priority to CN202211713009.8A priority Critical patent/CN115730646A/en
Publication of CN115730646A publication Critical patent/CN115730646A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a mixed expert network optimization method based on partial quantization, which relates to the technical field of information and comprises the following steps: s1, selecting a data sample set, and performing mixed expert network sampling; s2, establishing a corresponding relation between the subnets and the data sets, and selecting the high-frequency subnets and the corresponding data sets; and S3, carrying out iterative quantization processing on the selected high-frequency sub-network by using the corresponding data set. According to the method, the data flow is sampled in the inference process of the hybrid expert network, the corresponding relation between different data sets and different subnets in the hybrid expert network is obtained, and then the different subnets are subjected to quantitative optimization on the data sets corresponding to the different subnets, so that the calculation burden required by the overall optimization of the hybrid expert network is reduced, and the service throughput rate of the whole network is improved. The data set corresponding to the high-frequency used subnet is utilized to carry out quantization processing on the subnet, namely, the quantization processing on the whole network is avoided, the method is simple and efficient, and the performance of the whole network is improved.

Description

Hybrid expert network optimization method based on partial quantization
Technical Field
The invention relates to the technical field of information, in particular to a hybrid expert network optimization method based on partial quantization.
Background
The hybrid-of-expert-moe (hybrid-of-expert-moe) is a technology for organizing a neural network in a sparse manner, can integrate more network parameters while maintaining a limited increase in the demand for computing power, can be regarded as being sparsely connected together (through an expert selection switch) by a large number of relatively small-scale neural network systems (such as a full-connection network, a transform), can provide effective support for tasks such as complex object discrimination, and can serve as a basic model service for urban-level artificial intelligence application, and continuous optimization of the hybrid-expert-network can provide assistance for high-performance intelligent application in terms of computational efficiency and throughput rate.
Quantization is a way of compressing the network model, and is to approximate the network weight or activation value represented by a high bit width (e.g. 32-bit floating point number) with a lower bit width (e.g. 16-bit floating point number or 8-bit integer, or even 2-bit), and the representation on the value is to discretize a continuous value.
In the prior art, modern smart city systems increasingly rely on complex artificial intelligence models to perform discriminant analysis on space objects, the operational pressure brought by a large-scale neural network becomes a technical bottleneck for improving intelligent application, the existing hybrid expert network optimization method generally performs quantization processing on the whole network, the computational pressure brought after the large-scale neural network is deployed is large, the computational power consumed in the optimization process is large, and the problems of large computation amount, high cost, unbalanced load, low network throughput and insufficient performance support exist.
Disclosure of Invention
In order to overcome the technical problems of large computation amount, high cost and low network throughput rate in the prior art, the invention provides a hybrid expert network optimization method based on partial quantization.
In order to realize the purpose, the invention is realized by the following technical scheme:
a hybrid expert network optimization method based on partial quantization comprises the following steps:
s1, selecting a data sample set, and performing mixed expert network sampling;
s2, establishing a corresponding relation between the subnets and the data sets, and selecting a high-frequency subnet and a corresponding data set;
and S3, carrying out iterative quantization processing on the selected high-frequency sub-network by using the corresponding data set.
Preferably, in S1, information is sampled for a control gateway of each layer in the hybrid expert network to obtain an execution path for reasoning on the data sample.
Preferably, in S1, the hybrid expert network is sampled to obtain correspondence between the high-frequency-use subnetworks and the execution path information, and the corresponding data sample set information.
Preferably, the step S1 includes the steps of:
s11, for a given mixed expert network N, a data sample set D = { D0, D1 \8230; dN };
s12, implanting a sampling code into the N;
s13, repeating the following steps for each sample di in the D:
s131, writing the ID number i of the di into a log file;
s132, calling N to carry out reasoning calculation on di, and writing the accessed subnet set EN into a log file through a sampling code.
Preferably, in S2, in the sampled log file, the data expresses a correspondence between the bit sample ID and the execution path EN, and EN is composed of the subnet ID, so that the data pair can be disassembled, and the correspondence between the subnet ID and the sample subset Dk can be obtained by summarizing the data pair.
Preferably, the step S2 includes the steps of:
s21, for a given hybrid expert network N, obtaining sampling data PD from a data sample set D;
s22, inducing PN to obtain n relation pairs of ENk and Dk;
s23, screening r subnetworks { EN0, EN1 \8230; ENr } with Dk larger than a threshold value t in n relation pairs as candidate subnetworks for quantization processing.
Preferably, in S3, the plurality of subnets, for which there is no context-dependent correlation, are optimized simultaneously in parallel.
Preferably, the context correlation processing further divides the correspondence between the original subnet and the sample set into subnet IDs by using the execution path for data sample inference, executes the correspondence between the path EN and the sample set, where the execution path EN is used as context information, and performs iterative quantization processing on the subnet by using the sample set on the basis of the correspondence, so as to obtain the context correlation effect.
Preferably, in S3, through iterative quantization, an optimal quantization bit width configuration is found for a high-frequency usage expert subnet in the hybrid expert network.
Preferably, the step S3 includes the steps of:
s31, initializing, namely setting a data sample set Dr, selecting a high-frequency subnet ENr, setting a quantization threshold qt, setting an optimization network ENr' = ENr, and setting a current quantization configuration QC to QC1;
s32, judging whether qc is not equal to null, and if qc is not equal to null, returning to the current optimized network ENr'; otherwise, entering the next step;
s33, applying Dr to quantize the ENr according to qc bit width configuration to obtain ENr1;
s34, evaluating whether the ENr1 quality is in compliance by applying a quantization threshold qt, and if so, entering the next step; otherwise, returning to the current optimized network ENr';
and S35, enabling ENr' = ENr1, reducing the quantization bit width, selecting lower bit width quantization configuration, setting qc, and returning to S32.
Compared with the prior art, the invention has the advantages that:
the invention samples the data flow in the inference process of the hybrid expert network to obtain the corresponding relation between different data sets and different subnets in the hybrid expert network, and then carries out quantitative optimization on the different subnets on the data sets corresponding to the subnets, thereby reducing the calculation burden required by the overall optimization of the hybrid expert network. The optimized hybrid expert network can be processed in a calculation optimization mode in a high-frequency scene of a user, so that the service throughput rate of the whole network is improved.
According to the invention, the sub-network is quantized by utilizing the corresponding relation between the sub-network and the data obtained after sampling and utilizing the data set corresponding to the sub-network used at high frequency, namely, the whole network is prevented from being quantized, so that the high calculation overhead caused by the whole quantization is avoided, the method is simple and efficient, and the performance of the whole network is improved.
According to the invention, through network sampling, the relation between a subnet and a data sample is positioned, and data preparation is carried out for subsequent quantization processing; carrying out quantization processing on the subnets, and carrying out quantization optimization on the subnets used at high frequency; the quantization may be performed in a context-free, context-dependent or partially context-dependent manner, and the quantization may be performed in a parallel manner for a plurality of subnets that do not have a context-dependent relationship. By integrating the above methods, quantitative optimization of a given hybrid expert network in an application environment is achieved. Advantages of quantization include reduced model size, weight values represented by low bit width data, reduced memory space; the calculation pressure is reduced, the calculation of a high bit width floating point is reduced to a low bit width floating point or even integer calculation, and the calculation cost is greatly reduced; reducing the computational overhead can reduce power consumption and prompt throughput rate at the same time.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the structures shown in the drawings without creative efforts.
FIG. 1 is a diagram of a standard hybrid expert network architecture according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of the data flow path of hybrid expert network inference in accordance with an embodiment of the present invention;
FIG. 3 is a diagram illustrating context-free quantization according to an embodiment of the present invention;
FIG. 4 is a diagram illustrating context dependent quantization according to an embodiment of the present invention;
fig. 5 is a flowchart illustrating an operation of an iterative quantization process according to an embodiment of the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.
In the description of the present invention, it is to be understood that the terms "center", "longitudinal", "lateral", "length", "width", "thickness", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", etc. indicate orientations or positional relationships based on those shown in the drawings, and are used merely for convenience of description and for simplicity of description, and do not indicate or imply that the device or element so referred to must have a particular orientation, be constructed in a particular orientation, and be operated, and thus should not be considered as limiting the present invention.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or to implicitly indicate the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one of the feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
In the present invention, unless otherwise expressly stated or limited, the terms "mounted," "connected," "secured," and the like are to be construed broadly and can, for example, be fixedly connected, detachably connected, or integrally formed; may be mechanically coupled, may be electrically coupled or may be in communication with each other; they may be directly connected or indirectly connected through intervening media, or they may be connected internally or in any other suitable relationship, unless expressly stated otherwise. The specific meanings of the above terms in the present invention can be understood according to specific situations by those of ordinary skill in the art.
Referring to fig. 1-5, an embodiment of a hybrid expert network optimization method based on partial quantization according to the present invention includes the following steps:
s1, selecting a data sample set, and performing mixed expert network sampling;
s2, establishing a corresponding relation between the subnets and the data sets, and selecting a high-frequency subnet and a corresponding data set;
and S3, carrying out iterative quantization processing on the selected high-frequency sub-network by using the corresponding data set.
In this embodiment, in S1, information sampling is performed on a control gateway of each layer in the hybrid expert network to obtain an execution path for reasoning on a data sample.
In this embodiment, in S1, the hybrid expert network is sampled to obtain a corresponding relationship between the high-frequency usage subnet and the execution path information, and the corresponding data sample set information.
In this embodiment, the step S1 includes the following steps:
s11, for a given mixed expert network N, a data sample set D = { D0, D1 \8230dns };
s12, implanting a sampling code into the N;
s13, repeating the following steps for each sample di in the D:
s131, writing the ID number i of the di into a log file;
s132, calling N to carry out reasoning calculation on di, and writing the accessed subnet set EN into a log file through a sampling code.
In this embodiment, in S2, in the sampled log file, the data represents a corresponding relationship between the sample ID and the execution path EN, and EN is composed of the subnet ID, so that the data pair may be disassembled, and the data pair may be summarized to obtain a corresponding relationship between the subnet ID and the sample subset Dk.
In this embodiment, the step S2 includes the following steps:
s21, for a given hybrid expert network N, obtaining sampling data PD from a data sample set D;
s22, inducing PN to obtain n relation pairs of ENk and Dk;
s23, screening r subnetworks { EN0, EN1 \8230; ENr } with Dk larger than a threshold value t in n relation pairs as candidate subnetworks for quantization processing.
In this embodiment, in S3, a parallel method is adopted to optimize the subnets without generating a correlation due to context.
In this embodiment, the context correlation processing further divides the correspondence between the original subnet and the sample set into the subnet IDs by using the execution path for data sample inference, executes the correspondence between the path EN and the sample set, where the execution path EN is used as context information, and performs iterative quantization processing on the subnet by using the sample set on the basis, so as to obtain the context correlation effect.
In this embodiment, in S3, through iterative quantization, an optimal quantization bit width configuration is found for a high-frequency usage expert subnet in a hybrid expert network.
In this embodiment, the step S3 includes the following steps:
s31, initializing, namely setting a data sample set Dr, selecting a high-frequency subnet ENr, quantizing a threshold qt, setting an optimized network ENr' = ENr, and setting a current quantization configuration QC to QC1;
s32, judging whether qc is not equal to null, and if qc is not equal to null, returning to the current optimized network ENr'; otherwise, entering the next step;
s33, applying Dr, and quantizing ENr according to qc bit width configuration to obtain ENr1;
s34, evaluating whether the quality of the ENr1 is in compliance by applying a quantization threshold qt, and if so, entering the next step; otherwise, returning to the current optimized network ENr';
and S35, enabling ENr' = ENr1, reducing the quantization bit width, selecting lower bit width quantization configuration, setting qc, and returning to S32.
In this embodiment, as shown in fig. 1, the network may be divided into L layers, each layer has N expert networks (i.e., subnetworks), and the N expert networks are scheduled by one gateway gate, that is, the gate controls data flow to one or more of the expert networks. When inference calculation is carried out, only part of the network participates in calculation, so that the network learning capability can be considered to be expanded, meanwhile, the calculation capability requirement is kept not to be obviously changed, and a typical mixed expert network inference execution path is shown in fig. 2.
The hybrid expert network sampling is used for positioning an execution path of a given data sample in the hybrid expert network, namely a subnet set called by the hybrid expert network in the process of reasoning the data sample. As shown in FIG. 2, a hybrid expert network is composed of a set of L layers of expert networks, each layer has an expert subnet participating in reasoning, and the execution path through which a given data sample reasoning is executed in the graph includes the subnets EN = { EL1E2, EL2E6, \8230; ELL-2E5, ELL-1E3, ELE1} (where ELi represents the ith layer and Ej represents the jth expert subnet of the layer).
The sampling of the hybrid expert network is obtained by outputting and sampling each data sample through a control gateway (gate) of each layer, and the structural characteristics of the hybrid expert network show that the expert subnet through which each sample flows is determined by the gate, so that the sampling of the expert subnet is not needed, and the determination of the gate is recorded. Sampling is obtained by implanting a profiler code segment into an operator code where a gate unit is located, the sampling code writes a subnet id selected by the gate into a log file, and pseudo codes of the subnet id are expressed as follows:
moe _ gate _ i (data) {// i-th layer hybrid expert network gateway, data being input data
V/calculating the required deployment of the expert subnet
j=select_expert(data);
write _ log (j)// here is a sampling code, writing the id of the expert network into a log file
// Call jth expert subnet
data’=net_inference(experts[j],data);
}
The process of the mixed expert network sampling comprises the following steps:
s11, for a given mixed expert network N, a data sample set D = { D0, D1 \8230; dN };
s12, implanting a sampling code into the N;
s13, repeating the following steps for each sample di in the D:
s131, writing the ID number i of the di into a log file;
s132, calling N to carry out reasoning calculation on di, and writing the accessed subnet set EN into a log file through a sampling code.
The high-frequency subnet and the corresponding data set are selected, and in the sampled log file, the data expression bit sample ID corresponds to the execution path EN, wherein EN is composed of the subnet ID, so that the corresponding relation (data pair) of the sample ID corresponding to the subnet ID can be disassembled. By summarizing the data pairs, a subset Dk of samples corresponding to the subnet ID can be obtained, where Dk = { Dk, dk +1, \8230; dm }. For those subnets corresponding to the high-capacity sample subset, which obviously belong to the high-frequency subnet, they can be screened out for quantization. The steps of screening the high-frequency sub-networks for quantization processing are as follows:
s21, for a given hybrid expert network N, obtaining sampling data PD from a data sample set D;
s22, inducing PN to obtain n relation pairs of ENk and Dk;
s23, screening r subnetworks { EN0, EN1 \8230; ENr } with Dk larger than a threshold value t in n relation pairs as candidate subnetworks for quantization processing.
And (4) performing iterative quantization processing, and performing quantization optimization in an iterative manner after selecting a subnet for quantization processing. The invention aims at the typical precision bit width adopted by the deep neural network: the 32-bit floating point number (FP 32) is used as the starting bit width, and is sequentially represented by a radius (BF 16), an 8-bit integer (INT 8), a 4-bit integer (INT 4), a 3-bit value and a 2-bit value, and the quantization options are 5 in total from high to low (here, the notation is that the quantization configurations are QC1, QC2, QC3, QC4 and QC 5). The means for evaluating the quantization quality is usually by setting a quantization threshold qt, i.e. the quantized network accuracy is not lower than the quantization threshold qt.
For a given hybrid expert network N and subnet ENr, and corresponding data set Dr, a quantization threshold qt is given, from high to low, which in turn is quantized with a quantization configuration and validated by the quantization threshold. And finally, taking the quantization subnet optimized by the lowest quantization configuration (bit width configuration) not lower than the quantization threshold as a final optimization result. The workflow steps of the iterative quantization processing are as follows: s31, initializing, namely setting a data sample set Dr, selecting a high-frequency subnet ENr, quantizing a threshold qt, setting an optimized network ENr' = ENr, and setting a current quantization configuration QC to QC1;
s32, judging whether qc is not equal to null or not, and if qc is not equal to null, returning to the current optimized network ENr'; otherwise, entering the next step;
s33, applying Dr to quantize the ENr according to qc bit width configuration to obtain ENr1;
s34, evaluating whether the quality of the ENr1 is in compliance by applying a quantization threshold qt, and if so, entering the next step; otherwise, returning to the current optimized network ENr';
and S35, enabling ENr' = ENr1, reducing the quantization bit width, selecting lower bit width quantization configuration, setting qc, and returning to S32.
In the context correlation processing in the hybrid expert network, when a high-frequency sub-network exists in a plurality of execution paths at the same time, for example, as shown in fig. 3, the sub-network ELL-1E3 is shared by 2 execution paths, so that the quantization of the sub-network affects the inference effect of the 2 execution paths at the same time, and also means the sub-network is affected by data samples corresponding to two paths, so that the result of quantization optimization is a compromise of the data samples corresponding to the 2 paths. And (3) a context-dependent sampling method, which respectively carries out quantitative optimization on the sub-network ELL-1E3 by using samples of 2 execution paths, and respectively generates different optimized sub-networks ELL-1E31 and ELL-1E2. The method can avoid the compromise caused by multiple execution path data samples, and search a more optimal quantization scheme for a specific path.
The context correlation processing may further divide the subnet ID by using the execution path of data sample inference to make the correspondence relationship between the original subnet corresponding sample sets, and execute the path EN corresponding sample set, where the execution path EN is used as context information (EN is information in the sampling log, so that it is not necessary to modify the sampling method). On the basis, the subnet is subjected to iterative quantization processing by using the sample set, and the context-dependent effect can be obtained.
In parallel quantization of multiple subnets in a hybrid expert network, the context-dependent characteristics described in the above section may be used to determine two sets of data samples that are not coherent with each other, corresponding to two subnets for optimization (due to context-dependent properties, the same subnet may be optimized for different subnets in different contexts). Therefore, the invention can simultaneously optimize a plurality of subnets without generating relevance due to context in a parallel mode, and further improves the optimization efficiency.
The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is specific and detailed, but not to be understood as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. A hybrid expert network optimization method based on partial quantization is characterized by comprising the following steps:
s1, selecting a data sample set, and performing mixed expert network sampling;
s2, establishing a corresponding relation between the subnets and the data sets, and selecting the high-frequency subnets and the corresponding data sets;
and S3, carrying out iterative quantization processing on the selected high-frequency sub-network by using the corresponding data set.
2. The method of claim 1, wherein in the step S1, information is sampled for a control gateway of each layer in the hybrid expert network to obtain an execution path for reasoning on data samples.
3. The method according to claim 2, wherein in S1, the hybrid expert network is sampled to obtain correspondence between high-frequency usage subnetworks and execution path information and corresponding data sample set information.
4. The method for optimizing a hybrid expert network based on partial quantization according to claim 3, wherein in S1, the following steps are included:
s11, for a given mixed expert network N, a data sample set D = { D0, D1 \8230dns };
s12, implanting a sampling code into the N;
s13, repeating the following steps for each sample di in the D:
s131, writing the ID number i of the di into a log file;
s132, calling N to carry out reasoning calculation on di, and writing the accessed subnet set EN into a log file through a sampling code.
5. The method of claim 4, wherein in the S2, in the sampled log file, the data represents a correspondence between the sample ID and the execution path EN, wherein EN is composed of the subnet ID, so that the data pairs can be disassembled and summarized to obtain a correspondence between the subnet ID and the sample subset Dk.
6. The method of claim 5, wherein the step of S2 comprises the steps of:
s21, for a given hybrid expert network N, obtaining sampling data PD from a data sample set D;
s22, inducing PN to obtain n relation pairs of ENk and Dk;
s23, screening r subnets { EN0, EN1 \823030Onr } of which Dk is larger than a threshold value t in n relation pairs as candidate subnets for quantitative processing.
7. The method of claim 6, wherein in the step S3, the optimization is performed simultaneously in a parallel manner for a plurality of subnets without context-dependent correlation.
8. The method of claim 7, wherein the context correlation processing further divides the subnet ID from the original subnet to the sample set by using the execution path of data sample inference, executes the correspondence between the path EN and the sample set, and uses the execution path EN as context information, and then performs iterative quantization processing on the subnet by using the sample set, thereby obtaining the context correlation effect.
9. The method according to claim 8, wherein in S3, through iterative quantization, an optimal quantization bit width configuration is found for a high-frequency usage expert subnet in the hybrid expert network.
10. The all-weather autonomous partial quantization-based hybrid expert network optimization method according to claim 9, characterized in that in S3, it comprises the following steps:
s31, initializing, namely setting a data sample set Dr, selecting a high-frequency subnet ENr, quantizing a threshold qt, setting an optimized network ENr' = ENr, and setting a current quantization configuration QC to QC1;
s32, judging whether qc is not equal to null, and if qc is not equal to null, returning to the current optimized network ENr'; otherwise, entering the next step;
s33, applying Dr to quantize the ENr according to qc bit width configuration to obtain ENr1;
s34, evaluating whether the quality of the ENr1 is in compliance by applying a quantization threshold qt, and if so, entering the next step; otherwise, returning to the current optimized network ENr';
and S35, enabling ENr' = ENr1, reducing the quantization bit width, selecting lower bit width quantization configuration, setting qc, and returning to S32.
CN202211713009.8A 2022-12-30 2022-12-30 Hybrid expert network optimization method based on partial quantization Pending CN115730646A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211713009.8A CN115730646A (en) 2022-12-30 2022-12-30 Hybrid expert network optimization method based on partial quantization

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211713009.8A CN115730646A (en) 2022-12-30 2022-12-30 Hybrid expert network optimization method based on partial quantization

Publications (1)

Publication Number Publication Date
CN115730646A true CN115730646A (en) 2023-03-03

Family

ID=85301901

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211713009.8A Pending CN115730646A (en) 2022-12-30 2022-12-30 Hybrid expert network optimization method based on partial quantization

Country Status (1)

Country Link
CN (1) CN115730646A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117972293A (en) * 2024-03-28 2024-05-03 北京思凌科半导体技术有限公司 Computing method, device, equipment and storage medium based on mixed expert model

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117972293A (en) * 2024-03-28 2024-05-03 北京思凌科半导体技术有限公司 Computing method, device, equipment and storage medium based on mixed expert model
CN117972293B (en) * 2024-03-28 2024-06-07 北京思凌科半导体技术有限公司 Computing method, device, equipment and storage medium based on mixed expert model

Similar Documents

Publication Publication Date Title
CN110826692B (en) Automatic model compression method, device, equipment and storage medium
CN112910811B (en) Blind modulation identification method and device under unknown noise level condition based on joint learning
CN109787699B (en) Wireless sensor network routing link state prediction method based on mixed depth model
Yuan et al. Evoq: Mixed precision quantization of dnns via sensitivity guided evolutionary search
CN112287986A (en) Image processing method, device and equipment and readable storage medium
CN114764577A (en) Lightweight modulation recognition model based on deep neural network and method thereof
CN115730646A (en) Hybrid expert network optimization method based on partial quantization
CN116362325A (en) Electric power image recognition model lightweight application method based on model compression
CN112766484A (en) Floating point neural network model quantization system and method
CN113902108A (en) Neural network acceleration hardware architecture and method for quantizing bit width dynamic selection
CN115792677A (en) Lithium ion battery life prediction method based on improved ELM
CN115828143A (en) Node classification method for realizing heterogeneous primitive path aggregation based on graph convolution and self-attention mechanism
CN111832817A (en) Small world echo state network time sequence prediction method based on MCP penalty function
CN112906883A (en) Hybrid precision quantization strategy determination method and system for deep neural network
CN112399177B (en) Video coding method, device, computer equipment and storage medium
CN117162357A (en) Forming optimization control method and system for carbon fiber composite material
Peter et al. Resource-efficient dnns for keyword spotting using neural architecture search and quantization
CN110830939B (en) Positioning method based on improved CPN-WLAN fingerprint positioning database
CN116660756A (en) Battery capacity attenuation curve generation method based on condition generation countermeasure network
CN113157453B (en) Task complexity-based high-energy-efficiency target detection task dynamic scheduling method
CN113033653B (en) Edge-cloud cooperative deep neural network model training method
CN115564987A (en) Training method and application of image classification model based on meta-learning
CN114444654A (en) NAS-oriented training-free neural network performance evaluation method, device and equipment
CN114118151A (en) Intelligent spectrum sensing method with environment adaptive capacity
CN113111308A (en) Symbolic regression method and system based on data-driven genetic programming algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20230707

Address after: 200433 8/F, Building 3, No.3 Bay Plaza, 323 Guoding Road, Yangpu District, Shanghai

Applicant after: SHANGHAI FUDIAN INTELLIGENT TECHNOLOGY Co.,Ltd.

Address before: 200433 8/F, Building 3, No.3 Bay Plaza, 323 Guoding Road, Yangpu District, Shanghai

Applicant before: Zhao Jisheng

Applicant before: SHANGHAI FUDIAN INTELLIGENT TECHNOLOGY Co.,Ltd.