WO2023020456A1 - Procédé et appareil de quantification de modèle de réseau, dispositif et support de stockage - Google Patents
Procédé et appareil de quantification de modèle de réseau, dispositif et support de stockage Download PDFInfo
- Publication number
- WO2023020456A1 WO2023020456A1 PCT/CN2022/112673 CN2022112673W WO2023020456A1 WO 2023020456 A1 WO2023020456 A1 WO 2023020456A1 CN 2022112673 W CN2022112673 W CN 2022112673W WO 2023020456 A1 WO2023020456 A1 WO 2023020456A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- output
- network
- model
- feature
- business data
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 68
- 238000011002 quantification Methods 0.000 title claims abstract description 22
- 238000013139 quantization Methods 0.000 claims description 61
- 230000006870 function Effects 0.000 claims description 26
- 238000013138 pruning Methods 0.000 claims description 19
- 238000004590 computer program Methods 0.000 claims description 11
- 230000015654 memory Effects 0.000 claims description 11
- 230000008569 process Effects 0.000 description 27
- 238000012545 processing Methods 0.000 description 20
- 238000012549 training Methods 0.000 description 20
- 238000010586 diagram Methods 0.000 description 16
- 238000004891 communication Methods 0.000 description 8
- 239000011159 matrix material Substances 0.000 description 7
- 238000013528 artificial neural network Methods 0.000 description 6
- 238000003062 neural network model Methods 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 4
- 238000013473 artificial intelligence Methods 0.000 description 3
- 238000013135 deep learning Methods 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 230000009471 action Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 239000004973 liquid crystal related substance Substances 0.000 description 2
- 238000003491 array Methods 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000001953 sensory effect Effects 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Definitions
- the present disclosure relates to the field of computer technology and the field of deep learning technology, for example, to a network model quantification method, device, device, and storage medium.
- the training of the model generally requires a complicated training process and a long time to ensure the effectiveness and accuracy of the training.
- a complicated training process and a long time tend to cause the compression problem of the neural network.
- the present disclosure provides a method, device, device and storage medium for quantization of a network model, which can reduce the quantization loss of the network model.
- a method for quantifying a network model including:
- the loss function of the network model is determined according to the quantized eigenvalues, and the network parameters and clipping parameters in the network model are updated according to the loss function of the network model.
- a network model quantification device including:
- the clipping module is configured to clip the feature values of the feature points output by the network layer in the network model using clipping parameters
- a quantization module configured to quantize the eigenvalues clipped by the clipping module
- the update module is configured to determine the loss function of the network model according to the quantized feature value, and update the network parameters and clipping parameters in the network model according to the loss function of the network model.
- an electronic device including:
- the memory stores instructions that can be executed by the at least one processor, and the instructions are executed by the at least one processor, so that the at least one processor can execute the above-mentioned network model quantification method.
- a non-transitory computer-readable storage medium storing computer instructions, wherein the computer instructions are used to make the computer execute the above-mentioned network model quantification method.
- a computer program product including a computer program, and when the computer program is executed by a processor, the above method for quantifying a network model is implemented.
- FIG. 1 is a schematic diagram of a network model quantification method provided by an embodiment of the present disclosure
- FIG. 2 is a schematic diagram of a network model of a neural network provided by an embodiment of the present disclosure
- FIG. 3 is a distribution diagram of feature values of a feature point provided by an embodiment of the present disclosure.
- Fig. 4 is a schematic diagram of another network model quantification method provided by an embodiment of the present disclosure.
- Fig. 5 is a schematic diagram of a network model quantification device provided by an embodiment of the present disclosure.
- Fig. 6 is a schematic diagram of another network model quantification device provided by an embodiment of the present disclosure.
- Fig. 7 is a schematic diagram of a determination module provided by an embodiment of the present disclosure.
- Fig. 8 is a schematic diagram of an output difference unit provided by an embodiment of the present disclosure.
- Fig. 9 is a schematic diagram of another determination module provided by an embodiment of the present disclosure.
- Fig. 10 is a block diagram of an electronic device for implementing the method for quantifying a network model according to an embodiment of the present disclosure.
- Fig. 1 is a schematic diagram of a network model quantification method provided by an embodiment of the present disclosure, and the embodiment of the present disclosure is applicable to the case of processing a network model.
- the method can be executed by a network model quantification device, which can be implemented in hardware and/or software, and can be configured in electronic equipment. With reference to Fig. 1, this method comprises as follows:
- the network model refers to the training model of the artificial neural network in the field of deep learning, for example, it can be a convolutional neural network model, a recurrent neural network model, a long-term short-term memory neural network model, a wavelet neural network model, etc.
- the embodiments of the present disclosure are specific to the network model
- the network structure is not limited.
- an embodiment of the present disclosure provides a network model of a neural network.
- this network model can comprise input layer, hidden layer and output layer three parts, and wherein hidden layer can comprise multiple network layers, the output of last network layer is used as the input of next network layer (such as, In Fig. 2, the output of the i-1th network layer can be used as the input of the i-th network layer), the network model is trained in an iterative manner, and the network layer used to output the feature map (Feature map) in the embodiment of the present disclosure can be hidden containing layer.
- the feature map is used to characterize the features of the image (such as color features, grayscale features, etc.).
- a feature map corresponds to a feature matrix of at least one channel, each channel corresponding to a feature of the image.
- the feature points are elements in the feature matrix. Taking the 8 ⁇ 8 feature matrix as an example, the feature matrix includes 64 feature points.
- the value of a feature point is called a feature value, and the feature value can be an expression of a network parameter.
- Quantifying the eigenvalues means performing low-precision (Low precision) processing on the network parameters, which can reduce the precision of the network parameters in the network model, and can convert high-precision network parameters into low-precision network parameters with relatively low precision.
- Low precision Low precision
- floating-point numbers expressed in 32 bits can be converted into 8 bits, 3 bits or even 2 bits, etc., which occupy less memory space for storage.
- Quantifying the network parameters in the network model can reduce the storage space of the network model exponentially; when the network parameters occupy less memory, they can be stored in video memory or The data in the register also increases, so quantization can also speed up the operation rate of the neural network.
- an embodiment of the present disclosure provides a distribution graph of feature values of feature points, where the distribution graph is a distribution histogram of feature values of some feature points in a neural network model. As shown in Figure 3, the abscissa represents the feature value, and the ordinate represents the frequency of occurrence of the feature value.
- the feature value may be clipped by using clipping parameters before quantization, so as to reduce the quantization loss caused by outliers.
- the clipping in the quantization method of the network model provided by the embodiments of the present disclosure may be applicable to clipping feature values whose distribution tends to be symmetrically distributed.
- the clipping method can be as follows: the eigenvalues of the normal feature points are not processed; Outliers, update their eigenvalues to - ⁇ ; for outliers whose eigenvalues are positive, update their eigenvalues to ⁇ .
- the interval range of the feature value is reduced to [- ⁇ , ⁇ ]
- the quantization range is also reduced to [- ⁇ , ⁇ ]
- the quantization accuracy can be evenly selected in the interval [- ⁇ , ⁇ ] value.
- the network parameters in the network model are iteratively trained, and the loss function of the network model is optimized during the iterative training process of the network parameters.
- the embodiment of the present disclosure not only adjusts the network parameters, but also adjusts the pruning parameters, that is, the loss function not only derivates the network parameters, but also derivates the pruning parameters.
- the clipping parameters By updating the clipping parameters during the optimization of the loss function, better clipping parameters can be determined. In this way, relatively more reasonable clipping parameters can be determined, so that the clipping parameters can better participate in the quantization process, and then the quantization process can be better applied to reduce the storage space occupation of the network model and improve the processing speed of the network model.
- the quantization method of the network model provided by the embodiment of the present disclosure can reduce the quantization loss caused by outliers in the quantization process.
- the clipping parameters are iteratively trained according to the loss function, so that relatively more reasonable clipping parameters can be determined, so that the clipping parameters can better participate in the quantization process, and then the quantization can be better applied.
- the process reduces the storage space occupied by the network model and improves the processing speed of the network model.
- Fig. 4 is a schematic diagram of another network model quantification method provided by an embodiment of the present disclosure. This embodiment is an optional solution proposed on the basis of the foregoing embodiments.
- the quantification method of the network model provided in this embodiment includes:
- S470 Determine the loss function of the network model according to the quantized feature value, and update the network parameters and clipping parameters in the network model according to the loss function of the network model.
- the quantifiable network layer is also the network layer that outputs the feature map (in the following description of the embodiments of the present disclosure, if there is no clear description, the network layers involved in the description are all quantifiable network layers), that is, in Figure 2 hidden layer. Since the feature maps output by different network layers may be different, the distribution ranges of the feature values output by different network layers may be different. Therefore, in the embodiment of the present disclosure, when clipping parameters are used to clip the feature values of the feature points output by the network layer in the network model, the clipping parameters of different network layers may be different.
- the loss function of the network model is generally determined based on the output of the entire network model. Therefore, in the embodiments of the present disclosure, the original output of the model may be the original output of the network model when the network layer has not been quantized.
- the output of model quantization may be the output result of the network model after the feature value output by a network layer in the network model has undergone two processes of clipping and quantization. That is, in the embodiments of the present disclosure, the original model output and the model quantized output may be the output results of the entire network model before and after quantization, rather than the output results of a quantization layer.
- other network layers other than this network layer can be the original network layer without pruning and quantization, which can avoid The influence of the pruning or processing of other network layers on the output of the entire model interferes with the selection of the initial value of the target, so that a more reasonable initial value of the target can be selected.
- At least two candidate initial values of the pruning parameters of the i-1th network layer can be determined, and at least two candidate initial values
- the values include the candidate initial value ⁇ 1 and other candidate initial values.
- ⁇ 1 can be used to clip the feature values of the feature points output by the i-1th network layer, and quantify the clipped feature values, then the output of the output layer in Figure 2 is for the i-1th network layer
- other network layers except the i-1th network layer are all original outputs without clipping and quantization processing.
- the change amount of the quantized output of each model relative to the original output of the model can be determined, that is, the network output difference measure.
- the network output difference metric can represent that the corresponding candidate initial value participates in the quantization process. While reducing the accuracy of the eigenvalue, the smaller the change in the output value relative to the original data, the smaller the corresponding candidate initial value. After the value participates in the quantization process, the degree of influence on the model output value is smaller. Therefore, the smaller the network output difference measure, the more reasonable the candidate initial value is set. Based on this principle, among the at least two candidate initial values, the candidate initial value corresponding to the smallest network output difference measure can be determined as the target initial value of the pruning parameter of the network layer.
- business data can be used as the input of the network model to obtain the feature values of the feature points output by the network layer, and then from the feature values of the multiple feature points output by the network layer Select the largest eigenvalue, and determine candidate initial values according to the largest eigenvalue and candidate coefficients.
- the candidate coefficients may be determined according to a preset step size and a preset range of candidate coefficients.
- the preset candidate coefficient range may be a predetermined parameter interval, and the range of the parameter interval is between 0 and 1.
- the preset step size may be a predetermined step size for taking values of parameters in the preset candidate coefficient range.
- the preset range of candidate coefficients may be (0.4, 1), and the preset step size may be 0.01, so the candidate coefficients may be selected between (0.4, 1) with a step size of 0.01.
- the product of the largest eigenvalue among the eigenvalues of the plurality of feature points output by the network layer and the candidate coefficients may be determined as the candidate initial value.
- the determined candidate initial values will not exceed the range of the largest eigenvalue, that is, the pruning is performed within the range of the original eigenvalues, so that reasonable candidate initial values can be efficiently determined.
- the feature values of the feature points output by the network layer can be symmetrically distributed, and the clipping can be symmetrical clipping.
- the maximum eigenvalue in the embodiment of the present disclosure may be the eigenvalue with the largest absolute value among the eigenvalues of the multiple feature points output by the network layer. Since the preset candidate coefficient is a positive number, the maximum eigenvalue and The candidate initial value determined by the product of candidate coefficients is also a positive number, which facilitates data processing and improves the training rate of the network model.
- the training set may include a large number of images as the input of the network model.
- For each input image there will be an output feature map after inputting the image into the network model, that is, an output feature matrix will be obtained after each image is input into the network model. Therefore, optionally, in a possible implementation, for each piece of business data, the absolute value of the difference between the quantized output of the model associated with the candidate initial value and the original output of the model can be used as the network output of the piece of business data difference, and then take the average of the network output differences of multiple pieces of business data as the network output difference measure of the candidate initial value.
- the business data is the data in the business scenario where the network model is applied.
- the business data may be a face image; if the network model is used in a vehicle anomaly detection scenario, the business data may be a vehicle image.
- the service data listed in the embodiments of the present disclosure is only an example, and does not constitute a limitation on the service data.
- the network output difference measure of the candidate initial value can be determined according to the following expression:
- Network output difference Mean(abs(model quantized output-model original output)); among them, abs() is to calculate the absolute value, that is, to calculate the difference between the model quantified output and the original model output associated with the candidate initial value of each piece of business data The absolute value of the difference between them; Mean() is to calculate the average value, that is, to calculate the average value of the network output differences of these 300 pieces of business data.
- the business data can be used as the input of the network model, and each business data input will have a corresponding output feature matrix. Then, for each piece of business data, the feature map output by the network layer after each piece of business data is input into the network model can be clipped and quantized.
- the piece of business data can be used as the input of the network model to obtain the feature value of the feature point output by the network layer, and then the candidate initial value pair can be used. The feature values of the feature points output by the network layer are clipped, and the clipped feature values are quantized to obtain the model quantization output of the piece of business data.
- the clipping parameters may be trained.
- the training process in order to improve the training efficiency of the clipping parameters and determine the optimal clipping parameters more efficiently, at least two candidate initial values can be determined first, and then the network output difference metric associated with each candidate initial value can be determined. And the cropping parameters are determined according to the network output discrepancy metric. Since the network output difference metric can represent the influence degree of each candidate initial value on the model output value after participating in the quantization process. Therefore, better pruning parameters can be determined based on the network output difference metric.
- Clipping the eigenvalues based on the better clipping parameters can reduce the quantization loss caused by outliers and ensure that the original data is not distorted, so that the quantization process can be better applied to reduce the storage space occupied by the network model , to improve the processing speed of the network model.
- Fig. 5 is a schematic diagram of a network model quantification device provided by an embodiment of the present disclosure. This embodiment is applicable to the case of processing a network model.
- the device is configured in an electronic device and can implement any of the embodiments of the present disclosure. Quantification methods for network models.
- the quantization device 500 of this network model comprises as follows:
- the clipping module 501 is configured to clip the feature values of the feature points output by the network layer in the network model using the clipping parameters; the quantization module 502 is configured to quantify the feature values clipped by the clipping module 501; the update module 503 is set to The loss function of the network model is determined according to the quantized eigenvalues, and the network parameters and clipping parameters in the network model are updated according to the loss function of the network model.
- the quantization device 500 of the network model further includes:
- the determining module 504 is configured to, for each network layer that can be quantified, determine at least two candidate initial values of the pruning parameters of the network layer; the pruning module 501 is also configured to adopt the characteristics output by each candidate initial value to the network layer The eigenvalues of the points are clipped, and the quantization module 502 is also set to quantify the eigenvalues clipped by the clipping module 501 to obtain the model quantization output associated with the candidate initial values; the determination module 504 is also set to correlate according to each candidate initial value The difference between the quantized output of the model and the original output of the model determines the network output difference measure of the candidate initial value; the selection module 505 is set to select from at least two candidate initial values according to the network output difference measure of each candidate initial value Choose the target initial value for the clipping parameters for this network layer.
- the determination module 504 in the network model quantification device 500 includes:
- the output difference unit 710 is set to, for each piece of business data, the absolute value of the difference between the quantized output of the model associated with the candidate initial value and the original output of the model as the network output difference of the piece of business data; the average value unit 720 is set to The average value of the network output difference of multiple business data is used as the network output difference measure of the candidate initial value.
- the output difference unit 710 includes:
- the feature value subunit 810 is set to use the piece of business data as the input of the network model for each piece of business data to obtain the feature value of the feature point output by the network layer;
- the clipping subunit 820 is set to adopt a candidate initial value for the The eigenvalues of the feature points output by the network layer are clipped, and the quantization subunit 830 is set to quantize the eigenvalues after clipping the clipping subunit to obtain the model quantization output of the business data;
- the service output subunit 840 is set to The piece of business data is used as the input of the network model to obtain the original output of the model of the piece of business data;
- the output difference subunit 850 is set to the absolute value of the difference between the quantized output of the model of the piece of business data and the original output of the model, It is the network output difference of this piece of business data.
- the determining module 504 includes:
- the feature value unit 730 is set to use business data as the input of the network model to obtain the feature values of the multiple feature points output by the network layer; the selection unit 740 is set to be among the feature values of the multiple feature points output from the network layer Selecting the largest eigenvalue; the determining unit 750 is configured to determine a candidate initial value according to the largest eigenvalue and the candidate coefficients.
- the quantization method of the network model provided by the embodiment of the present disclosure can reduce the quantization loss caused by outliers in the quantization process.
- the clipping parameters are iteratively trained according to the loss function, so that relatively more reasonable clipping parameters can be determined, so that the clipping parameters can better participate in the quantization process, and then the quantization can be better applied.
- the process reduces the storage space occupied by the network model and improves the processing speed of the network model.
- the acquisition, storage and application of the user's personal information involved are in compliance with relevant laws and regulations, and do not violate public order and good customs.
- the present disclosure also provides an electronic device, a readable storage medium, and a computer program product.
- FIG. 10 shows a schematic block diagram of an example electronic device 600 that may be used to implement embodiments of the present disclosure.
- Electronic device 600 is intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers.
- Electronic device 600 may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices.
- the components shown herein, their connections and relationships, and their functions, are by way of example only, and are not intended to limit implementations of the disclosure described and/or claimed herein.
- the electronic device 600 includes a computing unit 601, which can be loaded into a random access memory (Random Access Memory, RAM) 603 to execute various appropriate actions and processes.
- RAM Random Access Memory
- various programs and data necessary for the electronic device 600 to perform operations can also be stored.
- the computing unit 601, ROM 602, and RAM 603 are connected to each other through a bus 604.
- An input/output (Input/Output, I/O) interface 605 is also connected to the bus 604 .
- the I/O interface 605 includes: an input unit 606, such as a keyboard, a mouse, etc.; an output unit 607, such as various types of displays, speakers, etc.; a storage unit 608, such as a magnetic disk, an optical disk etc.; and a communication unit 609, such as a network card, a modem, a wireless communication transceiver, and the like.
- the communication unit 609 allows the device 600 to exchange information/data with other devices over a computer network such as the Internet and/or various telecommunication networks.
- Computing unit 601 may be various general-purpose and/or special-purpose processing components with processing and computing capabilities. Some examples of the computing unit 601 include but are not limited to a central processing unit (Central Processing Unit, CPU), a graphics processing unit (Graphics Processing Unit, GPU), various dedicated artificial intelligence (Artificial Intelligence, AI) computing chips, various execution The computing unit of the machine learning model algorithm, the digital signal processing (Digital Signal Processing, DSP), and any appropriate processor, controller, microcontroller, etc.
- the calculation unit 601 executes various methods and processes described above, such as the quantification method of the network model.
- the method for quantifying network models may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 608 .
- part or all of the computer program can be loaded and/or installed on the electronic device 600 via the ROM 602 and/or the communication unit 609.
- the computer program is loaded into the RAM 603 and executed by the computing unit 601, one or more steps of the method for quantifying the network model described above can be performed.
- the computing unit 601 may be configured in any other appropriate way (for example, by means of firmware) to execute the network model quantization method.
- Various embodiments may include being implemented in one or more computer programs executable and/or interpretable on a programmable system including at least one programmable processor that can is a special-purpose or general-purpose programmable processor that can receive data and instructions from a storage system, at least one input device, and at least one output device, and transmit data and instructions to the storage system, the at least one input device, and the at least one output device.
- a programmable processor that can is a special-purpose or general-purpose programmable processor that can receive data and instructions from a storage system, at least one input device, and at least one output device, and transmit data and instructions to the storage system, the at least one input device, and the at least one output device.
- Program codes for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general-purpose computer, a special purpose computer, or other programmable data processing devices, so that the program codes, when executed by the processor or controller, make the functions/functions specified in the flow diagrams and/or block diagrams Action is implemented.
- the program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
- a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device.
- a machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
- a machine-readable medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing.
- machine-readable storage media examples include one or more wire-based electrical connections, portable computer disks, hard disks, RAM, ROM, Erasable Programmable Read-Only Memory (Erasable Programmable Read-Only Memory, EPROM, or Flash memory) ), fiber optics, Compact Disc Read-Only Memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.
- wire-based electrical connections portable computer disks, hard disks, RAM, ROM, Erasable Programmable Read-Only Memory (Erasable Programmable Read-Only Memory, EPROM, or Flash memory)
- fiber optics Compact Disc Read-Only Memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.
- CD-ROM Compact Disc Read-Only Memory
- the systems and techniques described herein can be implemented on a computer having a display device (e.g., a cathode ray tube (CRT) or a liquid crystal display ( Liquid Crystal Display (LCD) monitor); and a keyboard and pointing device (e.g., a mouse or trackball) through which a user can provide input to the computer.
- a display device e.g., a cathode ray tube (CRT) or a liquid crystal display ( Liquid Crystal Display (LCD) monitor
- a keyboard and pointing device e.g., a mouse or trackball
- Other types of devices may also be configured to provide interaction with the user; for example, the feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and may be in any form (including Acoustic input, speech input or, tactile input) to receive input from the user.
- the systems and techniques described herein can be implemented in a computing system that includes back-end components (e.g., as a data server), or a computing system that includes middleware components (e.g., an application server), or a computing system that includes front-end components (e.g., as a a user computer having a graphical user interface or web browser through which a user can interact with embodiments of the systems and techniques described herein), or including such backend components, middleware components, Or any combination of front-end components in a computing system.
- the components of the system can be interconnected by any form or medium of digital data communication, eg, a communication network. Examples of communication networks include: Local Area Network (LAN), Wide Area Network (Wide Area Network, WAN), blockchain networks, and the Internet.
- a computer system may include clients and servers.
- Clients and servers are generally remote from each other and typically interact through a communication network.
- the relationship of client and server arises by computer programs executing on the respective computers and having a client-server relationship to each other.
- the server can be a cloud server, also known as cloud computing server or cloud host, which is a host product in the cloud computing service system to solve the problems existing in traditional physical host and virtual private server (Virtual Private Server, VPS) services.
- VPS Virtual Private Server
- the defects of difficult management and weak business expansion can be a cloud server, a server of a distributed system, or a server combined with a blockchain.
- Steps can be reordered, added, or removed using the various forms of flow shown above.
- steps described in the present disclosure may be executed in parallel, sequentially, or in a different order, as long as the desired result of the technical solution disclosed in the present disclosure can be achieved, no limitation is imposed herein.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Image Analysis (AREA)
Abstract
La divulgation concerne un procédé et un appareil de quantification de modèle de réseau, un dispositif et un support de stockage. Le procédé de quantification de modèle de réseau consiste à : utiliser des paramètres d'écrêtage pour écrêter des valeurs propres de points caractéristiques émis par une couche réseau dans un modèle de réseau ; quantifier les valeurs propres écrêtées ; et déterminer une fonction de perte du modèle de réseau en fonction des valeurs propres quantifiées et mettre à jour les paramètres de réseau et des paramètres d'écrêtage dans le modèle de réseau en fonction de la fonction de perte du modèle de réseau.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110937246.1A CN113642710B (zh) | 2021-08-16 | 2021-08-16 | 一种网络模型的量化方法、装置、设备和存储介质 |
CN202110937246.1 | 2021-08-16 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023020456A1 true WO2023020456A1 (fr) | 2023-02-23 |
Family
ID=78421994
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2022/112673 WO2023020456A1 (fr) | 2021-08-16 | 2022-08-16 | Procédé et appareil de quantification de modèle de réseau, dispositif et support de stockage |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN113642710B (fr) |
WO (1) | WO2023020456A1 (fr) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113642710B (zh) * | 2021-08-16 | 2023-10-31 | 北京百度网讯科技有限公司 | 一种网络模型的量化方法、装置、设备和存储介质 |
CN115083423B (zh) * | 2022-07-21 | 2022-11-15 | 中国科学院自动化研究所 | 语音鉴别的数据处理方法和装置 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110659725A (zh) * | 2019-09-20 | 2020-01-07 | 字节跳动有限公司 | 神经网络模型的压缩与加速方法、数据处理方法及装置 |
CN110852439A (zh) * | 2019-11-20 | 2020-02-28 | 字节跳动有限公司 | 神经网络模型的压缩与加速方法、数据处理方法及装置 |
US20210064982A1 (en) * | 2019-08-28 | 2021-03-04 | International Business Machines Corporation | Cross-domain homophily quanitifcation for transfer learning |
CN113177634A (zh) * | 2021-04-28 | 2021-07-27 | 中国科学院自动化研究所 | 基于神经网络输入输出量化的图像分析系统、方法和设备 |
CN113642710A (zh) * | 2021-08-16 | 2021-11-12 | 北京百度网讯科技有限公司 | 一种网络模型的量化方法、装置、设备和存储介质 |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106445939B (zh) * | 2015-08-06 | 2019-12-13 | 阿里巴巴集团控股有限公司 | 图像检索、获取图像信息及图像识别方法、装置及系统 |
CN108229681A (zh) * | 2017-12-28 | 2018-06-29 | 郑州云海信息技术有限公司 | 一种神经网络模型压缩方法、系统、装置及可读存储介质 |
CN109791628B (zh) * | 2017-12-29 | 2022-12-27 | 清华大学 | 神经网络模型分块压缩方法、训练方法、计算装置及系统 |
CN112560881B (zh) * | 2019-09-25 | 2024-04-19 | 北京四维图新科技股份有限公司 | 对象识别方法和装置、数据处理方法 |
CN111275187A (zh) * | 2020-01-16 | 2020-06-12 | 北京智芯微电子科技有限公司 | 深度神经网络模型的压缩方法及装置 |
CN112381083A (zh) * | 2020-06-12 | 2021-02-19 | 杭州喔影网络科技有限公司 | 一种基于潜在区域对的显著性感知图像裁剪方法 |
CN112861996A (zh) * | 2021-03-15 | 2021-05-28 | 北京智芯微电子科技有限公司 | 深度神经网络模型压缩方法及装置、电子设备、存储介质 |
-
2021
- 2021-08-16 CN CN202110937246.1A patent/CN113642710B/zh active Active
-
2022
- 2022-08-16 WO PCT/CN2022/112673 patent/WO2023020456A1/fr active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20210064982A1 (en) * | 2019-08-28 | 2021-03-04 | International Business Machines Corporation | Cross-domain homophily quanitifcation for transfer learning |
CN110659725A (zh) * | 2019-09-20 | 2020-01-07 | 字节跳动有限公司 | 神经网络模型的压缩与加速方法、数据处理方法及装置 |
CN110852439A (zh) * | 2019-11-20 | 2020-02-28 | 字节跳动有限公司 | 神经网络模型的压缩与加速方法、数据处理方法及装置 |
CN113177634A (zh) * | 2021-04-28 | 2021-07-27 | 中国科学院自动化研究所 | 基于神经网络输入输出量化的图像分析系统、方法和设备 |
CN113642710A (zh) * | 2021-08-16 | 2021-11-12 | 北京百度网讯科技有限公司 | 一种网络模型的量化方法、装置、设备和存储介质 |
Also Published As
Publication number | Publication date |
---|---|
CN113642710B (zh) | 2023-10-31 |
CN113642710A (zh) | 2021-11-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2023020456A1 (fr) | Procédé et appareil de quantification de modèle de réseau, dispositif et support de stockage | |
WO2019184823A1 (fr) | Procédé et dispositif de traitement d'images basé sur un modèle de réseau neuronal à convolution | |
CN113516248B (zh) | 一种量子门测试方法、装置及电子设备 | |
US20180018558A1 (en) | Method for neural network and apparatus performing same method | |
WO2018099473A1 (fr) | Procédé et système d'analyse de scène, et dispositif électronique | |
CN112560996B (zh) | 用户画像识别模型训练方法、设备、可读存储介质及产品 | |
WO2023020289A1 (fr) | Procédé et appareil de traitement pour un modèle de réseau, dispositif, et support de stockage | |
CN111738419B (zh) | 神经网络模型的量化方法和装置 | |
CN108734287A (zh) | 深度神经网络模型的压缩方法及装置、终端、存储介质 | |
WO2023207039A1 (fr) | Procédé et appareil de traitement de données, et dispositif et support de stockage | |
CN114494814A (zh) | 基于注意力的模型训练方法、装置及电子设备 | |
KR20220116395A (ko) | 사전 훈련 모델의 결정 방법, 장치, 전자 기기 및 저장 매체 | |
CN114730367A (zh) | 模型训练方法、装置、存储介质和程序产品 | |
US20240070454A1 (en) | Lightweight model training method, image processing method, electronic device, and storage medium | |
CN114580649A (zh) | 消除量子泡利噪声的方法及装置、电子设备和介质 | |
CN117371508A (zh) | 模型压缩方法、装置、电子设备以及存储介质 | |
WO2022188711A1 (fr) | Procédé et appareil d'entraînement de modèle svm, dispositif et support de stockage lisible par ordinateur | |
CN112101543A (zh) | 神经网络模型确定方法、装置、电子设备及可读存储介质 | |
US11580196B2 (en) | Storage system and storage control method | |
CN113344213A (zh) | 知识蒸馏方法、装置、电子设备及计算机可读存储介质 | |
CN117351299A (zh) | 图像生成及模型训练方法、装置、设备和存储介质 | |
CN115759209B (zh) | 神经网络模型的量化方法、装置、电子设备及介质 | |
CN111950689A (zh) | 神经网络的训练方法及装置 | |
CN115577786A (zh) | 量子熵确定方法、装置、设备及存储介质 | |
CN112200275B (zh) | 人工神经网络的量化方法及装置 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22857768 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 22857768 Country of ref document: EP Kind code of ref document: A1 |