WO2024041332A1 - 数据类型选择方法、装置及可读存储介质 - Google Patents

数据类型选择方法、装置及可读存储介质 Download PDF

Info

Publication number
WO2024041332A1
WO2024041332A1 PCT/CN2023/110621 CN2023110621W WO2024041332A1 WO 2024041332 A1 WO2024041332 A1 WO 2024041332A1 CN 2023110621 W CN2023110621 W CN 2023110621W WO 2024041332 A1 WO2024041332 A1 WO 2024041332A1
Authority
WO
WIPO (PCT)
Prior art keywords
bit width
exponential
target data
distribution
data
Prior art date
Application number
PCT/CN2023/110621
Other languages
English (en)
French (fr)
Inventor
周诗怡
李震
刘少礼
Original Assignee
寒武纪(西安)集成电路有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 寒武纪(西安)集成电路有限公司 filed Critical 寒武纪(西安)集成电路有限公司
Publication of WO2024041332A1 publication Critical patent/WO2024041332A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/483Computations with numbers represented by a non-linear combination of denominational numbers, e.g. rational numbers, logarithmic number system or floating-point numbers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0495Quantised networks; Sparse networks; Compressed networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present disclosure relates generally to the field of neural networks. More specifically, the present disclosure relates to data type selection methods, devices and readable storage media.
  • the present disclosure proposes data type selection methods, devices and readable storage media in multiple aspects.
  • the present disclosure provides a data type selection method, including: counting exponential distribution information of target data in the target network layer, where the exponential distribution information includes an exponential distribution range and an exponential value distribution amount; setting a setting based on the exponential distribution range The exponent bit width of the target data is determined; the mantissa bit width of the target data is set according to the exponent value distribution; the data type of the target data is determined based on the exponent bit width and the mantissa bit width.
  • the present disclosure provides a computer-readable storage medium on which a computer program code of a data type selection method is stored.
  • the computer program code is run by a processing device, the data type selection of the first aspect is performed. method.
  • the present disclosure provides a computer program product, including a computer program of a data type selection method, which implements the steps of the data type selection method of the first aspect when executed by a processor.
  • the present disclosure provides a computing device, including a memory, a processor, and a computer program stored on the memory.
  • the processor executes the computer program to implement the steps of the data type selection method of the first aspect.
  • the present disclosure provides a data type selection device, including: a statistics module, a setting module and a decision module.
  • the statistics module is used to count the exponential distribution information of the target data in the target network layer.
  • the exponential distribution information includes the exponential distribution range and the exponential value distribution amount;
  • the setting module is used to: set the exponential bit width of the target data according to the exponential distribution range. , and set the mantissa bit width of the target data according to the exponent value distribution;
  • the determination module is used to determine the data type corresponding to the target data according to the exponent bit width and the mantissa bit width.
  • This disclosure provides a data type selection scheme.
  • the exponential bit width and mantissa bit width of the target data are dynamically set, and then the target data is determined.
  • data type in other words, if different target data have different exponential distribution ranges and/or exponential value distribution amounts, Even for the same network layer, the data type of the target data may be different. Since this disclosure considers the data distribution of the target data in the process of processing external data in the neural network model, the total bit width of the data can effectively represent the target data, taking into account both operational efficiency and operational accuracy.
  • Figure 1 is a schematic format showing data types
  • Figure 2 is a flow chart illustrating a data type selection method according to an embodiment of the present disclosure
  • Figure 3 is an exemplary data distribution diagram showing a kind of target data according to an embodiment of the present disclosure
  • Figure 4 is a flowchart illustrating a data type selection method according to another embodiment of the present disclosure.
  • FIG. 5 is a schematic diagram showing a data type selection device according to another embodiment of the present disclosure.
  • the term “if” may be interpreted as “when” or “once” or “in response to determining” or “in response to detecting” depending on the context.
  • Floating point numbers are numerical representations of numbers belonging to a specific subset of rational numbers. They are used in computers to approximate a real number.
  • the decimal point can "float".
  • S is the sign bit. When S is 0, it means that the floating point number is a positive number. When S is 1, it means that the floating point number is a negative number.
  • the sign bit occupies one bit width by default; M is the mantissa, M The numerical range of represents the size of the mantissa bit width; E is the exponent bit, which weights floating point numbers. The weight is 2 to the power of E.
  • the value of the exponent bit represents the size of the exponent bit width.
  • Figure 1 shows an exemplary floating point number data type. As shown in Figure 1, the representation of floating point numbers in the computer is divided into three fields: S field, E field and M field, which respectively correspond to the sign bit width of the floating point number. Exponent bit width and mantissa bit width. The total bit width of a floating point number is the sum of the sign bit width, exponent bit width, and mantissa bit width.
  • the data type FP32 has 32 bits (bit 0 to bit 31 in the figure), of which the S field occupies 1 bit (the 31st bit in the figure means the bit width is 1), and the E field occupies 8 bits ( The 30th bit to the 23rd bit in the figure means the bit width is 8), and the M field occupies 23 bits (the 22nd bit to the 0th bit in the figure means the bit width is 23).
  • the data type FP16 has 16 bits (bit 0 to bit 15 in the figure), of which the S field occupies 1 bit (the 15th bit in the figure means the bit width is 1), and the E field occupies 8 bits ( The 14th bit to the 7th bit in the figure indicates that the bit width is 8), and the M field occupies 7 bits (the 6th bit to the 0th bit in the figure indicates that the bit width Width is 7).
  • the data type FP8 has 8 bits (bit 0 to bit 7 in the figure), of which the S field occupies 1 bit (bit 7 in the figure, which means the bit width is 1), and the E field occupies 6 bits ( The 6th bit to the 1st bit in the figure means that the bit width is 6), and the M field occupies 1 bit (the 0th bit in the figure means that the bit width is 1).
  • this disclosure is aimed at performing data distribution statistics on the target data of the target network layer, determining the exponential distribution information of the target data pairs based on the data statistics, and determining the data type of the target data based on this information.
  • the target network layer refers to any layer in the neural network model, more specifically, the network layer that receives the data whose data distribution is to be analyzed for calculation; the target data refers to the training data of the target network layer, generally neurons Data (such as image data, voice data) or weights.
  • Figure 2 shows a flowchart of a data type selection method according to one embodiment of the present disclosure.
  • step 201 the exponential distribution information of the target data in the target network layer is counted, where the exponential distribution information includes an exponential distribution range and an exponential value distribution amount.
  • the exponential distribution information refers to the result of data distribution statistics on the index value corresponding to a set of target data and the number of occurrences of the index value in the set of target data.
  • Figure 3 shows an exemplary exponential distribution graph. Specifically, Figure 3 shows the exponential distribution of all Topdiff data (target data) in a certain fully connected layer (target network layer) in the Transformer neural network model, where the abscissa represents the index value of this set of Topdiff data. The range range is from -10th power to -70th power. The ordinate indicates the number of occurrences of each index value in this set of Topdiff data.
  • the Transformer neural network model is a network model published by Google in 2017. It is a model that uses the attention mechanism to improve the speed of model training. It includes multiple encoder modules for encoding text and decoding encoded files. of multiple decoder modules. Topdiff is an operator that reversely outputs the gradient in a convolutional neural network.
  • the Transformer neural network model and Topdiff operator are well known to those skilled in the art, so they will not be described in detail.
  • the exponential distribution range refers to the distribution range of the exponential bits corresponding to all data in this set of target data. Taking Figure 3 as an example, the exponential distribution range is from -10 to the -70 power.
  • the index value distribution amount refers to the distribution amount corresponding to each index value within the exponential distribution range, that is, the number of occurrences of each index value. It should be noted that the number of index values ranging from -70th power to -56th power and -13th power to -10th power is not zero, but compared with other index values, its number is The order of magnitude is too small for the diagram to display accurately.
  • Figure 3 is a schematic diagram for easy understanding. In the actual computer processing process, such a visual data distribution diagram may not exist.
  • the specific index distribution range and index value distribution amount are obtained by obtaining the characters of the data statistics results.
  • step 202 the exponential bit width of the target data is set according to the exponential distribution range.
  • the index bit width there are multiple ways to set the index bit width.
  • the exponential distribution range method takes the maximum value of the index bit (maxE) minus the minimum value of the index bit (minE) in the target data as the basis for calculating the index bit width.
  • the maximum value of the exponent bit is -10 and the minimum value of the exponent bit is -70.
  • the exponential distribution range ED reflects the exponential range to completely represent this set of Topdiff data. Since 2 raised to the 5th power is 32, which is less than 60, and 2 raised to the 6th power is 64, which is greater than 60, in order to fully express the exponent range of this set of Topdiff data, the exponent bits need to be 6 bits.
  • the neural network model can only accept several preset and fixed exponential bit widths.
  • the Transformer neural network model in Figure 3 can only accept an exponential bit width of 3, 4, or 5, where the exponential bit width is 3, 4, or 5.
  • a bit width of 3 corresponds to the bit range of the exponent bit width between [0,15]
  • an exponent bit width of 4 corresponds to a bit range of the exponent bit width of [16,31]
  • an exponent bit width of 5 corresponds to The bit range of exponential bit width is between [32,63].
  • another way of setting the index bit width in this embodiment is a selection method, that is, one of several acceptable choices is specified based on the exponential distribution range. Specifically, it is known that the exponential distribution range ED of the example in Figure 3 is 60, which falls within the interval [32, 63]. Therefore, in this step, from the preset and fixed exponential bit widths 3, 4, and 5 Choose an exponent bit width of 5.
  • index bit width is the empirical value method, that is, combining the formula and the hyperparameter obtained based on the empirical value, calculating the product of the exponential distribution range and the hyperparameter, and rounding the product. The rounded result is the exponent bit width.
  • E is the exponential bit width
  • ceil means rounding up
  • ED means the exponential distribution range of the data
  • is the first hyperparameter
  • the empirical value in this embodiment is 0.1.
  • the exponential bit width of the exponential distribution range that adapts to the target data can be obtained.
  • the mantissa bit width of the target data is set according to the exponential value distribution amount.
  • This embodiment identifies the number of exponential digits with the largest number of distributions, calculates the distribution number ratio between the maximum distribution number and the total distribution number, and calculates the product of the distribution number ratio and the second hyperparameter obtained based on experience, and finally calculates the product Perform rounding, and the result after rounding is the mantissa bit width.
  • An exemplary formula is as follows:
  • M represents the mantissa bit width
  • ceil represents rounding up
  • E1 is the ratio of the maximum distribution number to the total distribution number
  • is the second hyperparameter.
  • Step 202 can be executed first and then step 203, or step 203 can be executed first and then step 202.
  • This embodiment does not Make limitations.
  • the data type of the target data is determined based on the exponent bit width and the mantissa bit width.
  • the total bit width of the target data can be obtained by directly adding the sign bit width, exponent bit width and mantissa bit width.
  • the exponential distribution range method is used to obtain a 6-bit exponential bit width
  • the selection method is used to obtain a 5-bit exponential bit width
  • the empirical value method is used to obtain a 6-bit exponential bit width.
  • the mantissa bit width of 2 bits is obtained, and the sign bit width is fixed at 1 bit.
  • This embodiment uses the data type of FP8 or FP9 to represent the Topdiff data in Figure 3 to train the fully connected layer.
  • This embodiment dynamically sets the exponent bit width and mantissa bit width of the target data by counting the exponential distribution range and exponential value distribution amount of the target data in the target network layer, and then determines the data type of the target data. That is to say, if there are multiple A group of target data has different exponential distribution ranges and/or exponential value distribution amounts. Even if they are in the same network layer, their target data The data types may also be different. Since this embodiment considers the data distribution of the target data during the training process, the total bit width of the data can effectively represent each set of target data, taking into account both operational efficiency and operational accuracy.
  • Another embodiment of the present disclosure is a method for selecting a data type based on a specific index value interval, the flow chart of which is shown in Figure 4 . This embodiment is also based on the Topdiff data collocation description in Figure 3.
  • step 401 the exponential distribution information of the target data in the target network layer is counted, where the exponential distribution information includes an exponential distribution range and an exponential value distribution amount.
  • step 401 the exponential distribution information of the target data in the target network layer is counted, where the exponential distribution information includes an exponential distribution range and an exponential value distribution amount.
  • step 402 when setting the exponential bit width of the target data according to the exponential distribution range, identify the exponential distribution interval within the exponential distribution range that is within the distribution proportion threshold, and use the bit width value corresponding to the exponential distribution interval as the exponential bit Width.
  • the distribution proportion threshold needs to be set first, which can be determined according to actual needs, such as 70%, 80% or 90%, that is, a continuous area of 70%, 80% or 90% in the exponential distribution range is used As a benchmark to obtain the exponential bit width, the part of the exponential distribution range that exceeds the distribution proportion threshold is not included in the calculation.
  • ED is 54 instead of 60, then use any of the index bit width setting methods in step 202 to set the index bit width.
  • step 403 the mantissa bit width of the target data is set according to the exponential value distribution. This step is the same as step 203. No further details will be given here.
  • step 404 the data type of the target data is determined based on the exponent bit width and the mantissa bit width.
  • the sign bit width, exponent bit width, and mantissa bit width are first added up to obtain the total bit width of the target data.
  • the operation details are the same as step 204.
  • the bit width value interval in which the total bit width is located is identified to select a corresponding data type among the plurality of data types.
  • this embodiment sets multiple acceptable data types for the target network layer, corresponding to different bit width value intervals.
  • the acceptable data types are FP16 and FP8. If the total bit width calculated in this step is less than or equal to 8, although both FP16 and FP8 meet the accuracy requirements of the target data, this embodiment will select the smaller total bit width, that is, select the data type FP8. If the total bit width is between 8 and 16, FP16 will be selected because FP8 cannot meet the demand.
  • This embodiment selects data types that meet the required data accuracy from several preset data types, so that the neural network model can be debugged in advance, specially planned on the hardware to adapt to the preset several data types, and then based on exponential distribution Information to select one of the preset data types, which helps improve data processing efficiency.
  • Another embodiment of the present disclosure is a data type selection method. This embodiment selects the minimum value of the total bit width as the final target among the various data types determined according to the embodiments corresponding to Figures 2 and 4. The data type of the data further ensures that the operating efficiency of floating point numbers is the best among multiple solutions.
  • Another embodiment of the present disclosure is a computer-readable storage medium on which computer program code for a data type selection method is stored.
  • the methods of the aforementioned embodiments are executed.
  • the above integrated units can be implemented in the form of software program modules. If implemented in the form of software program modules and sold or used as a stand-alone product, the integrated unit may be stored in a computer-readable memory.
  • the software product can be stored in a memory, which can include a number of instructions to cause a computer device (such as a personal computer, server or network equipment, etc.) to perform some or all steps of the method described in the embodiments of the present disclosure.
  • the aforementioned memory can include but is not limited to U disk, flash disk, read only memory (ROM), random access memory (Random Access Memory, RAM), mobile hard disk, magnetic disk or optical disk, etc. that can store programs.
  • the medium of the code can include but is not limited to U disk, flash disk, read only memory (ROM), random access memory (Random Access Memory, RAM), mobile hard disk, magnetic disk or optical disk, etc.
  • Another embodiment of the present disclosure is a computer program product, including a computer program of a data type selection method.
  • the computer program is executed by a processor, the steps of the method shown in the foregoing embodiments are implemented.
  • Another embodiment of the present disclosure is a computer device, including a memory, a processor, and a computer program stored on the memory.
  • the processor executes the computer program to implement the steps of the methods shown in the foregoing embodiments.
  • Each of the above embodiments sets the data type and performs training according to its specific method. After the training is completed, the actual target data (such as real image data, voice data) is imported into the trained neural network model for inference to obtain inference results. .
  • the actual target data such as real image data, voice data
  • the above embodiments can be implemented using a computer program.
  • the purpose of executing the computer program is to solve the technical problem of being unable to meet the accuracy requirements of data for neural network training in many application scenarios.
  • the computer program is run on the computer so that external objects (
  • the control or processing of the target data reflects the technical means that follow the laws of nature, so that the data type can dynamically follow the composition of different target data and effectively represent the target data, thereby taking into account accuracy and efficiency, etc. in line with nature. Regular technical effects.
  • the purpose of executing the computer program is to process a kind of target data.
  • the computer executes a technical data processing program to analyze its exponential distribution information, and completes the implementation of the target data according to the laws of nature.
  • a series of technical processing to determine the data type of the target data, so as to obtain technical data processing effects that are consistent with natural laws such as accuracy and efficiency.
  • Figure 5 shows another embodiment of the present disclosure, which is a data type selection device.
  • the data type selection device includes a statistics module 501 , a setting module 502 and a decision module 503 .
  • the statistics module 501 is used to count exponential distribution information of target data in the target network layer, where the exponential distribution information includes an exponential distribution range and an exponential value distribution amount.
  • Exponential distribution information refers to the result of data distribution statistics on the index value corresponding to a set of target data and the number of occurrences of the index value in this set of target data.
  • the setting module 502 sets the exponential bit width of the target data according to the exponential distribution range.
  • the exponential bit width there are many ways to set the exponential bit width, including the above-mentioned exponential distribution range method, selection method and empirical value method. No matter which method is used, the statistics module 501 obtains the exponential bit width that adapts to the exponential distribution range of the target data.
  • the setting module 502 when setting the exponential bit width of the target data according to the exponential distribution range, the setting module 502 identifies the exponential distribution interval in the exponential distribution range that is within the distribution proportion threshold, and uses the exponential distribution interval corresponding to The bit width value is the exponential bit width.
  • the setting module 502 needs to first set the distribution proportion threshold, which can be determined according to actual needs, such as 70%, 80% or 90%, that is, a continuous area of 70%, 80% or 90% in the exponential distribution range is used as The benchmark is used to obtain the exponential bit width, and the part of the exponential distribution range that exceeds the distribution proportion threshold is not included in the calculation.
  • the setting module 502 sets the mantissa bit width of the target data according to the exponential value distribution. First, identify the number of exponential digits with the largest number of distributions, calculate the distribution number ratio between the maximum distribution number and the total distribution number, and calculate the product of the distribution number ratio and the second hyperparameter obtained based on experience, and finally calculate the product Integer, the result after rounding is the mantissa bit width.
  • An exemplary formula is as follows:
  • M represents the mantissa bit width
  • ceil represents rounding up
  • E1 is the ratio of the maximum distribution number to the total distribution number
  • is the second hyperparameter.
  • the determining module 503 is used to determine the data type of the target data according to the exponent bit width and the mantissa bit width. In one case, the decision module 503 directly sums the sign bit width, exponent bit width and mantissa bit width to obtain the total bit width of the target data.
  • the decision module 503 first sums the sign bit width, exponent bit width and mantissa bit width to obtain the total bit width of the target data, and then identifies the bit width value range in which the total bit width is located to select The corresponding data type among the multiple data types. Specifically, the decision module 503 sets multiple acceptable data types for the target network layer, corresponding to different bit width value intervals.
  • This embodiment can further determine multiple acceptable data types based on the various methods mentioned above. From these acceptable data types, Select the minimum value of the total bit width among the accepted data types as the data type of the final target data.
  • the electronic equipment or devices of the present disclosure may include servers, cloud servers, server clusters, data processing devices, robots, computers, printers, scanners, tablets, smart terminals, PC equipment, Internet of Things terminals, mobile Terminals, mobile phones, driving recorders, navigators, sensors, cameras, cameras, video cameras, projectors, watches, headphones, mobile storage, wearable devices, visual terminals, autonomous driving terminals, vehicles, household appliances, and/or medical equipment.
  • the means of transportation include airplanes, ships and/or vehicles;
  • the household appliances include televisions, air conditioners, microwave ovens, refrigerators, rice cookers, humidifiers, washing machines, electric lights, gas stoves, and range hoods;
  • the medical equipment includes nuclear magnetic resonance machines and B-ultrasound machines and/or electrocardiograph.
  • the electronic equipment or device of the present disclosure can also be applied to the Internet, Internet of Things, data centers, energy, transportation, public administration, manufacturing, education, power grid, telecommunications, finance, retail, construction sites, medical and other fields. Furthermore, the electronic equipment or device of the present disclosure can also be used in cloud, edge, terminal and other application scenarios related to artificial intelligence, big data and/or cloud computing. In one or more embodiments, electronic equipment or devices with high computing power according to the solution of the present disclosure can be applied to cloud equipment (such as cloud servers), while electronic equipment or devices with low power consumption can be applied to terminal equipment and/or Edge devices (such as smartphones or cameras).
  • cloud equipment such as cloud servers
  • electronic equipment or devices with low power consumption can be applied to terminal equipment and/or Edge devices (such as smartphones or cameras).
  • the hardware information of the cloud device and the hardware information of the terminal device and/or the edge device are compatible with each other, so that the hardware resources of the cloud device can be obtained based on the hardware information of the terminal device and/or the edge device.
  • this disclosure expresses some methods and their embodiments as a series of actions and their combinations, but those skilled in the art can understand that the solutions of this disclosure are not limited by the order of the described actions. . Therefore, based on the disclosure or teachings of this disclosure, those skilled in the art will understand that certain steps may be performed in other orders or simultaneously. Furthermore, those skilled in the art can understand that the embodiments described in the present disclosure can be regarded as optional embodiments, that is, the actions or modules involved are not necessarily necessary for the implementation of one or some solutions of the present disclosure. In addition, depending on the solution, the description of some embodiments in this disclosure also has different emphasis. In view of this, those skilled in the art can understand the parts that are not described in detail in a certain embodiment of the present disclosure, and can also refer to the relevant descriptions of other embodiments.
  • units illustrated as separate components may or may not be physically separate, and components illustrated as units may or may not be physical units.
  • the aforementioned components or units may be co-located or distributed over multiple network units.
  • some or all of the units may be selected to achieve the purpose of the solution of the embodiments of the present disclosure.
  • multiple units in the embodiments of the present disclosure may be integrated into one unit or each unit may exist physically separately.
  • the above-mentioned integrated unit can also be implemented in the form of hardware, that is, a specific hardware circuit, which can include digital circuits and/or analog circuits, etc.
  • the physical implementation of the hardware structure of the circuit may include, but is not limited to, physical devices, and the physical devices may include, but is not limited to, devices such as transistors or memristors.
  • various devices such as computing devices or other processing devices mentioned herein can be implemented by appropriate hardware processors, such as central processing units, GPUs, FPGAs, DSPs, and ASICs.
  • the aforementioned storage unit or storage device may be any appropriate storage medium (including magnetic storage media or magneto-optical storage media, etc.).
  • a data type selection method including: counting the exponential distribution information of the target data in the target network layer, the index distribution information including the index distribution range and the index value distribution amount; setting the index of the target data according to the index distribution range bit width; set the mantissa bit width of the target data according to the exponent value distribution; determine the data type of the target data according to the exponent bit width and the mantissa bit width.
  • Clause A2 The method according to Clause A1, wherein the step of setting the exponential bit width of the target data includes: calculating the product of the exponential distribution range and the first hyperparameter; rounding the product; wherein, after rounding The result is the exponential bit width.
  • Clause A4 The method according to Clause A1, wherein the step of setting the exponential bit width of the target data includes: identifying an exponential distribution interval in the exponential distribution range that is within a distribution proportion threshold; wherein, corresponding to the exponential distribution interval The bit width value is the exponential bit width.
  • Clause A5 The method according to Clause A1, wherein the step of setting the mantissa bit width of the target data includes: identifying the number of exponential bits with the maximum distribution number; calculating the ratio of the distribution number of the maximum distribution number to the total distribution number; calculating The product of the distribution number ratio and the second hyperparameter; the product is rounded; where the rounded result is the mantissa bit width.
  • Clause A6 The method described in Clause A2 or A5, wherein the rounding step is rounding up.
  • Clause A7 The method according to Clause A1, wherein the step of determining the data type of the target data includes: summing the sign bit width, the exponent bit width and the mantissa bit width to obtain the total bit width of the target data.
  • the target network layer can accept multiple data types, corresponding to different bit width value intervals.
  • the step of determining the data type of the target data includes: identifying the location where the total bit width is located. Bit-width numeric range to select the corresponding data type among the multiple data types.
  • Clause A9 A computer-readable storage medium on which a computer program code of a data type selection method is stored. When the computer program code is run by a processing device, the method described in any one of clauses A1 to A8 is executed.
  • Clause A10 a computer program product, including a computer program of a data type selection method, which when executed by a processor implements the steps of the method described in any one of clauses A1 to A8.
  • a computer device including a memory, a processor and a computer program stored on the memory.
  • the processor executes the computer program to implement the steps of the method described in any one of clauses A1 to A8.
  • a data type selection device including: a statistics module, used to count the exponential distribution information of the target data in the target network layer, where the exponential distribution information includes an exponential distribution range and an exponential value distribution amount; a setting module, used to: The exponent bit width of the target data is set according to the exponential distribution range; the mantissa bit width of the target data is set according to the exponential value distribution amount; the decision module is used to determine the target data according to the exponent bit width and the mantissa bit width. The corresponding data type.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Nonlinear Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Complex Calculations (AREA)

Abstract

一种数据类型选择方法、及对应的装置、可读存储介质,其中,所述方法包括:统计目标网络层中目标数据的指数分布信息,所述指数分布信息包括指数分布范围及指数值分布量;根据所述指数分布范围设定所述目标数据的指数位宽;根据所述指数值分布量设定所述目标数据的尾数位宽;根据所述指数位宽和所述尾数位宽决定所述目标数据的数据类型。基于以上方案,能够兼顾神经网络的运算效率和运算精度。

Description

数据类型选择方法、装置及可读存储介质
相关申请的交叉引用
本申请要求于2022年08月26日申请的,申请号为202211032359.8,名称为“数据类型选择方法、装置及可读存储介质”的中国专利申请的优先权。
技术领域
本披露一般地涉及神经网络领域。更具体地,本披露涉及数据类型选择方法、装置及可读存储介质。
背景技术
神经网络的训练需要大量的数据运算,庞大的数据运算量对运算速度也提出了更高的要求。由于使用低位宽的浮点数运算可以加速训练、降低内存占用,因此在训练神经网络时,会先执行量化,也就是将高位宽(高精度)的浮点数转换成低位宽(低精度)的浮点数,再进行运算。
现有技术在执行量化时经常对一类算子使用固定的数据类型,但这并非最优的安排,原因在于输入数据的分布不总是一样,例如在网络中的不同位置的数据分布是不同的,又例如算子的不同类别的数据分布不同,又例如在训练初期与训练后期的数据分布是变化的。由于神经网络中不同的网络位置、不同的算子、以及在训练过程中不同的阶段,其数据的分布情况都是不一样的,若固定使用单一低位宽的浮点数来进行运算,在很多场景下会牺牲精度与性能。
因此,一种能够兼顾运算效率和运算精度的数据类型的选择方案是迫切需要的。
发明内容
为了至少解决如上所提到的一个或多个技术问题,本披露在多个方面中提出了数据类型选择方法、装置及可读存储介质。
在第一方面中,本披露提供一种数据类型选择方法,包括:统计目标网络层中目标数据的指数分布信息,该指数分布信息包括指数分布范围及指数值分布量;根据该指数分布范围设定该目标数据的指数位宽;根据该指数值分布量设定该目标数据的尾数位宽;根据该指数位宽和该尾数位宽决定该目标数据的数据类型。
在第二方面中,本披露提供一种计算机可读存储介质,其上存储有数据类型选择方法的计算机程序代码,当该计算机程序代码由处理装置运行时,执行前述第一方面的数据类型选择方法。
在第三方面中,本披露提供一种计算机程序产品,包括数据类型选择方法的计算机程序,该计算机程序被处理器执行时实现前述第一方面的数据类型选择方法的步骤。
在第四方面中,本披露提供一种计算装置,包括存储器、处理器及存储在存储器上的计算机程序,该处理器执行该计算机程序以实现前述第一方面的数据类型选择方法的步骤。
在第五方面中,本披露提供一种数据类型选择装置,包括:统计模块、设定模块及决定模块。统计模块用以统计目标网络层中目标数据的指数分布信息,该指数分布信息包括指数分布范围及指数值分布量;设定模块用以:根据该指数分布范围设定该目标数据的指数位宽,以及根据该指数值分布量设定该目标数据的尾数位宽;决定模块用以根据该指数位宽和该尾数位宽决定该目标数据对应的数据类型。
本披露提供了一种的数据类型的选择方案,通过统计目标网络层中目标数据的指数分布范围及指数值分布量,动态设定该目标数据的指数位宽和尾数位宽,进而决定目标数据的数据类型,换言之,如果不同的目标数据具有不同的指数分布范围及/或指数值分布量, 即便是同一网络层,其目标数据的数据类型亦可能不同。由于本披露在神经网络模型中处理外部数据的过程中,考虑了目标数据的数据分布,使得数据的总位宽能够有效地表示目标数据,兼顾了运算效率和运算精度。
附图说明
通过参考附图阅读下文的详细描述,本披露示例性实施方式的上述以及其他目的、特征和优点将变得易于理解。在附图中,以示例性而非限制性的方式示出了本披露的若干实施方式,并且相同或对应的标号表示相同或对应的部分,其中:
图1是示出数据类型的示意性格式;
图2是示出本披露实施例的数据类型选择方法的流程图;
图3是示出本披露实施例的一种目标数据的示例性数据分布图;
图4是示出本披露另一个实施例的数据类型选择方法的流程图;
图5是示出本披露另一实施例的数据类型选择装置的示意图。
具体实施方式
下面将结合本披露实施例中的附图,对本披露实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本披露一部分实施例,而不是全部的实施例。基于本披露中的实施例,本领域技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本披露保护的范围。
应当理解,本披露的说明书和权利要求书中使用的术语“包括”和“包含”指示所描述特征、整体、步骤、操作、元素和/或组件的存在,但并不排除一个或多个其它特征、整体、步骤、操作、元素、组件和/或其集合的存在或添加。
还应当理解,在此本披露说明书中所使用的术语仅仅是出于描述特定实施例的目的,而并不意在限定本披露。如在本披露说明书和权利要求书中所使用的那样,除非上下文清楚地指明其它情况,否则单数形式的“一”、“一个”及“该”意在包括复数形式。还应当进一步理解,在本披露说明书和权利要求书中使用的术语“和/或”是指相关联列出的项中的一个或多个的任何组合以及所有可能组合,并且包括这些组合。
如在本说明书和权利要求书中所使用的那样,术语“如果”可以依据上下文被解释为“当...时”或“一旦”或“响应于确定”或“响应于检测到”。
下面结合附图来详细描述本披露的具体实施方式。
在神经网络中,使用低位宽的浮点数运算可以加速训练、降低内存占用。浮点数是属于有理数中某特定子集的数的数字表示,在计算机中用以近似表示某个实数,小数点是可以“浮动”的。IEEE浮点标准以V=(-1)S*M*2E的形式表示一个浮点数。其中,S为符号位,当S为0时表示该浮点数为正数,当S为1时表示该浮点数为负数,在一般场景下,符号位默认占用一个位宽;M为尾数,M的数值范围表示尾数位宽的大小;E为指数位,对浮点数进行加权,权值是2的E次幂,指数位的数值表示指数位宽的大小。
图1示出示例性的浮点数数据类型,如图1所示,浮点数在计算机中的表示分为三个字段:S字段,E字段和M字段,分别对应至浮点数的符号位宽,指数位宽和尾数位宽。浮点数的总位宽为符号位宽、指数位宽、尾数位宽的总和。数据类型FP32具有32比特(图中的第0比特位至第31比特位),其中S字段占用1比特(图中的第31比特位,即表示位宽为1),E字段占用8比特(图中的第30比特位至第23比特位,即表示位宽为8),M字段占用23比特(图中的第22比特位至第0比特位,即表示位宽为23)。数据类型FP16具有16比特(图中的第0比特位至第15比特位),其中S字段占用1比特(图中的第15比特位,即表示位宽为1),E字段占用8比特(图中的第14比特位至第7比特位,即表示位宽为8),M字段占用7比特(图中的第6比特位至第0比特位,即表示位 宽为7)。数据类型FP8具有8比特(图中的第0比特位至第7比特位),其中S字段占用1比特(图中的第7比特位,即表示位宽为1),E字段占用6比特(图中的第6比特位至第1比特位,即表示位宽为6),M字段占用1比特(图中第0比特位,即表示位宽为1)。
在现有技术的方案中,同一层的数据常用单一低位宽的浮点数来进行量化,虽然能够提高浮点运算的运算效率,但是无法基于数据的分布变化动态调整数据类型,在很多应用场景下满足不了网络训练对数据的精度要求。
有鉴于此,本披露针对于目标网络层的目标数据进行数据分布的统计,根据数据统计的情况确定目标数据对指数分布信息,并根据这些信息决定目标数据的数据类型。目标网络层指的是神经网络模型中的任一层,更具体来说是接收欲分析其数据分布的数据进行运算的网络层;目标数据指的是目标网络层的训练数据,一般为神经元数据(像是图像数据、语音数据)或权值。
图2示出根据本披露的一个实施例的数据类型选择方法的流程图。
在步骤201中,统计目标网络层中目标数据的指数分布信息,其中该指数分布信息包括指数分布范围及指数值分布量。
在此实施例中,指数分布信息指的是对一组目标数据对应的指数值,以及该指数值在这组目标数据中出现的个数进行数据分布统计的结果。图3示出一种示例性的指数分布图。具体地,图3示出的是在Transformer神经网络模型中,某一全连接层(目标网络层)中所有Topdiff数据(目标数据)的指数分布情况,其中横坐标表示这组Topdiff数据的指数值范围区间,自-10次幂至-70次幂,纵坐标表示这组Topdiff数据中每一个指数值对应出现的个数。
Transformer神经网络模型为谷歌在2017年发表的网络模型,是一种利用注意力机制来提高模型训练速度的模型,包括用于对文本进行编码的多个编码器模块,以及对编码后文件进行解码的多个解码器模块。而Topdiff为一种在卷积神经网络中反向输出梯度的算子。Transformer神经网络模型与Topdiff算子为本领域技术人员所熟知,故不赘述。
在此实施例中,指数分布范围指的是在这组目标数据中所有数据对应的指数位的分布范围,以图3为例,其指数分布范围由-10次幂至-70次幂。指数值分布量指的是在指数分布范围内各个指数值对应的分布量,即每一个指数值出现的个数。需特别说明的是,指数值范围介于-70次幂至-56次幂与-13次幂至-10次幂的个数不为零,只是相较于其他指数值来说,其个数的数量级过小以至于图式无法精确地显示。
可以理解的是,图3为便于直观理解的示意图,在计算机实际处理过程中可能并不存在这样可视化的数据分布图,具体的指数分布范围和指数值分布量系通过得到数据统计结果后的字符代码呈现。综上所述,在此步骤中,针对目标数据形成类似图3的统计表或计算机可识别的统计字符代码。
在步骤202中,根据该指数分布范围设定该目标数据的指数位宽。在此实施例中,指数位宽的设定方式有多种。
其中一种是指数分布范围法,将目标数据中指数位最大值(maxE)减去指数位最小值(minE),作为计算指数位宽的基准。以图3为例,指数位最大值为-10,指数位最小值为-70,则图3中Topdiff数据的指数分布范围ED为:
ED=maxE-minE
即ED=(-10)-(-70)=60。指数分布范围ED反映的是要完全表示这组Topdiff数据的指数范围。由于2的5次方为32,小于60,且2的6次方为64,大于60,因此欲完全表示这组Topdiff数据的指数范围,指数位需为6比特。
在一些情况下,神经网络模型仅能接受预设的且固定的数种指数位宽,举例来说,图3的Transformer神经网络模型仅能接受指数位宽为3、4、或5,其中指数位宽为3对应至指数位宽的比特位范围介于[0,15],指数位宽为4对应至指数位宽的比特位范围介于[16,31],指数位宽为5对应至指数位宽的比特位范围介于[32,63]。在这种情况下,此实施例的另一种指数位宽的设定方式为选择法,即基于指数分布范围指定可接受的数种选择中的其中一种。具体来说,已知图3的例子的指数分布范围ED为60,落在[32,63]的区间内,故在此步骤中从预设的且固定的指数位宽3、4、5中选择指数位宽为5。
此实施例的另一种指数位宽的设定方式为经验值法,即结合公式和根据经验值而得的超参数,计算该指数分布范围与超参数的乘积,对该乘积进行取整,取整后的结果为该指数位宽。具体公式如下:
E=ceil(ED·α)
其中,E为指数位宽,ceil表示向上取整,ED表示数据的指数分布范围,α为第一超参数,在此实施例中的经验值为0.1。图3的例子基于经验值法所计算出来的指数位宽E为ceil(60·0.1)=6。
不论采用上述指数分布范围法、选择法或经验值法,在此步骤中可以获得适配目标数据的指数分布范围的指数位宽。
在步骤203中,根据指数值分布量设定目标数据的尾数位宽。此实施例识别具有最大分布个数的指数位数,计算该最大分布数与总分布数的分布数比值,并计算该分布数比值与根据经验而得第二超参数的乘积,最后对该乘积进行取整,取整后的结果即为尾数位宽。示例性的公式如下:
其中,M表示尾数位宽,ceil表示向上取整,E1为最大分布数与总分布数的分布数比值,β为第二超参数。继续以图3的Topdiff数据为例,具有最大分布数的指数位数为图中的数据301,其对应的最大分布数(个数)为354459,假设Topdiff数据的总分布数为4718570,则分布数比值E1为354459/4718570=0.07512,在此实施例中β的经验值为0.06,则尾数位宽M=ceil(0.07512/0.06)=2。
可以理解的是,在本披露的实施例中,步骤202和203间没有严格的执行顺序关系,可以先执行步骤202再执行步骤203,也可以先执行步骤203再执行步骤202,此实施例不做限定。
在步骤204中,根据该指数位宽和该尾数位宽决定该目标数据的数据类型。具体地,直接加总符号位宽、指数位宽和尾数位宽便可得到目标数据的总位宽。还是以图3的Topdiff数据为例,在步骤202中采用指数分布范围法获得6比特的指数位宽,采用选择法获得5比特的指数位宽,采用经验值法获得6比特的指数位宽,在步骤203中获得2比特的尾数位宽,而符号位宽固定为1比特,因此采用指数分布范围法获得的总位宽为6+2+1=9(FP9),采用选择法获得的总位宽为5+2+1=8(FP8),采用经验值法获得的总位宽为6+2+1=9(FP9)。此实施例以FP8或FP9的数据类型来表示图3的Topdiff数据进行该全连接层的训练。
此实施例通过统计目标网络层中目标数据的指数分布范围及指数值分布量,动态设定该目标数据的指数位宽和尾数位宽,进而决定目标数据的数据类型,也就是说,如果多组目标数据具有不同的指数分布范围及/或指数值分布量,即便是同一网络层,其目标数据 的数据类型亦可能不同。由于此实施例在训练过程中考虑了目标数据的数据分布,使得数据的总位宽能够有效的表示每组目标数据,兼顾了运算效率和运算精度。
在实际应用中,某些指数分布范围内的数据占比非常小,例如图3中指数值介于-70至-56间的数据,即使在决定数据类型时忽略这些数据,其训练结果不致产生实质影响。本披露的另一个实施例为基于特定指数值区间进行数据类型选择的方法,其流程图如图4所示。此实施例同样基于图3的Topdiff数据搭配说明。
在步骤401中,统计目标网络层中目标数据的指数分布信息,其中该指数分布信息包括指数分布范围及指数值分布量。此步骤与步骤201无异,故不再赘述。
在步骤402中,在执行根据指数分布范围设定目标数据的指数位宽时,识别指数分布范围中介于分布占比阈值内的指数分布区间,以该指数分布区间对应的位宽值为指数位宽。
在此步骤中,需先设定分布占比阈值,其可根据实际需求而定,例如70%、80%或90%,即采用指数分布范围中的70%、80%或90%的连续区域作为基准来获得指数位宽,指数分布范围中超出分布占比阈值的部分不纳入计算。
以分布占比阈值为90%为例,图3的指数分布范围ED=60,则指数分布区间为60*90%=54。以54作为参考值,即ED为54而不是60,接着使用步骤202中的任一种指数位宽的设定方式来设定指数位宽。
在步骤403中,根据该指数值分布量设定该目标数据的尾数位宽,此步骤与步骤203无异。此处不再赘述。
在步骤404中,根据指数位宽和尾数位宽决定该目标数据的数据类型。在此实施例中,首先加总符号位宽、指数位宽和尾数位宽,以得到该目标数据的总位宽,其操作细节与步骤204无异。
接着识别该总位宽所在的位宽数值区间,以选择该多个数据类型中相应的数据类型。具体来说,此实施例针对目标网络层设定了多个可接受的数据类型,分别对应不同的位宽数值区间,例如可接受的数据类型为FP16和FP8。若在此步骤中计算得到的总位宽小于或等于8,虽然FP16和FP8都满足该目标数据的精度需求,但此实施例会选择总位宽较小的,即选择数据类型FP8。若总位宽介于8与16间,由于FP8无法满足需求,则会选择FP16。
此实施例自预设的数个数据类型中选用数据精度符合要求的数据类型,使得神经网络模型可以事先进行调试,在硬件上特别规划以适配预设的数个数据类型,再基于指数分布信息来选择预设数据类型的其中之一,有助于提高数据处理效率。
本披露的另一个实施例是一种数据类型选择的方法,此实施例基于图2、图4对应的实施例所确定的各种数据类型中,选择总位宽的最小值作为最终确定的目标数据的数据类型,进一步保证了浮点数的运行效率是多个方案中的最优者。
本披露另一个实施例为一种计算机可读存储介质,其上存储有数据类型选择方法的计算机程序代码,当所述计算机程序代码由处理器运行时,执行如前所述各实施例的方法。在一些实现场景中,上述集成的单元可以采用软件程序模块的形式来实现。如果以软件程序模块的形式实现并作为独立的产品销售或使用时,所述集成的单元可以存储在计算机可读取存储器中。基于此,当本披露的方案以软件产品(例如计算机可读存储介质)的形式体现时,该软件产品可以存储在存储器中,其可以包括若干指令用以使得计算机设备(例如个人计算机、服务器或者网络设备等)执行本披露实施例所述方法的部分或全部步骤。前述的存储器可以包括但不限于U盘、闪存盘、只读存储器(Read Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。
本披露另一个实施例为一种计算机程序产品,包括数据类型选择方法的计算机程序,所述计算机程序被处理器执行时实现前述各实施例所示的方法的步骤。
本披露另一个实施例为一种计算机装置,包括存储器、处理器及存储在存储器上的计算机程序,所述处理器执行所述计算机程序以实现前述各实施例所示的方法的步骤。
上述各实施例依其特定方式设定数据类型并进行训练,在训练完成后,将实际的目标数据(如真实的图像数据、语音数据)导入训练好的神经网络模型进行推理,以获得推理结果。
上述实施例可以利用计算机程序来实现,其解决方案执行计算机程序的目的是解决在很多应用场景下满足不了神经网络训练对数据的精度要求的技术问题,在计算机上运行计算机程序从而对外部对象(目标数据的指数分布信息)进行控制或处理所反映的是遵循自然规律的技术手段,使得数据类型能够动态地跟随不同目标数据的组成而有效地表示目标数据,由此兼顾精度与效率等符合自然规律的技术效果。不仅如此,上述实施例涉及计算机程序的解决方案执行计算机程序的目的是为了处理一种目标数据,通过计算机执行一种技术数据处理程序以分析其指数分布信息,按照自然规律完成对该目标数据实施的一系列技术处理,以决定该目标数据的数据类型,从而获得兼顾精度与效率等符合自然规律的技术数据处理效果。
图5示出本披露另一个实施例,是一种数据类型选择装置。如图5所示,数据类型选择装置包括统计模块501、设定模块502与决定模块503。
统计模块501用以统计目标网络层中目标数据的指数分布信息,其中该指数分布信息包括指数分布范围及指数值分布量。指数分布信息指的是对一组目标数据对应的指数值,以及该指数值在这组目标数据中出现的个数进行数据分布统计的结果。
设定模块502根据该指数分布范围设定该目标数据的指数位宽。在此实施例中,指数位宽的设定方式有多种,包括上述的指数分布范围法、选择法与经验值法。不论采用何种法,统计模块501获得适配目标数据的指数分布范围的指数位宽。
在另一种情况下,设定模块502在执行根据指数分布范围设定目标数据的指数位宽时,识别指数分布范围中介于分布占比阈值内的指数分布区间,以该指数分布区间对应的位宽值为指数位宽。设定模块502需先设定分布占比阈值,其可根据实际需求而定,例如70%、80%或90%,即采用指数分布范围中的70%、80%或90%的连续区域作为基准来获得指数位宽,指数分布范围中超出分布占比阈值的部分不纳入计算。
设定模块502再根据指数值分布量设定目标数据的尾数位宽。首先识别具有最大分布个数的指数位数,计算该最大分布数与总分布数的分布数比值,并计算该分布数比值与根据经验而得第二超参数的乘积,最后对该乘积进行取整,取整后的结果即为尾数位宽。示例性的公式如下:
其中,M表示尾数位宽,ceil表示向上取整,E1为最大分布数与总分布数的分布数比值,β为第二超参数。
决定模块503用以根据该指数位宽和该尾数位宽决定该目标数据的数据类型。在一种情况下,决定模块503直接加总符号位宽、指数位宽和尾数位宽便可得到目标数据的总位宽。
在另一种情况下,决定模块503先加总符号位宽、指数位宽和尾数位宽,以得到该目标数据的总位宽,接着识别该总位宽所在的位宽数值区间,以选择该多个数据类型中相应的数据类型。具体来说,决定模块503针对目标网络层设定了多个可接受的数据类型,分别对应不同的位宽数值区间。
此实施例可以更进一步地基于上述各种方式确定多种可接受的数据类型,自这些可接 受的数据类型中选择总位宽的最小值作为最终确定的目标数据的数据类型。
根据不同的应用场景,本披露的电子设备或装置可以包括服务器、云端服务器、服务器集群、数据处理装置、机器人、电脑、打印机、扫描仪、平板电脑、智能终端、PC设备、物联网终端、移动终端、手机、行车记录仪、导航仪、传感器、摄像头、相机、摄像机、投影仪、手表、耳机、移动存储、可穿戴设备、视觉终端、自动驾驶终端、交通工具、家用电器、和/或医疗设备。该交通工具包括飞机、轮船和/或车辆;该家用电器包括电视、空调、微波炉、冰箱、电饭煲、加湿器、洗衣机、电灯、燃气灶、油烟机;该医疗设备包括核磁共振仪、B超仪和/或心电图仪。本披露的电子设备或装置还可以被应用于互联网、物联网、数据中心、能源、交通、公共管理、制造、教育、电网、电信、金融、零售、工地、医疗等领域。进一步,本披露的电子设备或装置还可以用于云端、边缘端、终端等与人工智能、大数据和/或云计算相关的应用场景中。在一个或多个实施例中,根据本披露方案的算力高的电子设备或装置可以应用于云端设备(例如云端服务器),而功耗小的电子设备或装置可以应用于终端设备和/或边缘端设备(例如智能手机或摄像头)。在一个或多个实施例中,云端设备的硬件信息和终端设备和/或边缘端设备的硬件信息相互兼容,从而可以根据终端设备和/或边缘端设备的硬件信息,从云端设备的硬件资源中匹配出合适的硬件资源来模拟终端设备和/或边缘端设备的硬件资源,以便完成端云一体或云边端一体的统一管理、调度和协同工作。
需要说明的是,为了简明的目的,本披露将一些方法及其实施例表述为一系列的动作及其组合,但是本领域技术人员可以理解本披露的方案并不受所描述的动作的顺序限制。因此,依据本披露的公开或教导,本领域技术人员可以理解其中的某些步骤可以采用其他顺序来执行或者同时执行。进一步,本领域技术人员可以理解本披露所描述的实施例可以视为可选实施例,即其中所涉及的动作或模块对于本披露某个或某些方案的实现并不一定是必需的。另外,根据方案的不同,本披露对一些实施例的描述也各有侧重。鉴于此,本领域技术人员可以理解本披露某个实施例中没有详述的部分,也可以参见其他实施例的相关描述。
在具体实现方面,基于本披露的公开和教导,本领域技术人员可以理解本披露所公开的若干实施例也可以通过本文未公开的其他方式来实现。例如,就前文该的电子设备或装置实施例中的各个单元来说,本文在考虑了逻辑功能的基础上对其进行拆分,而实际实现时也可以有另外的拆分方式。又例如,可以将多个单元或组件结合或者集成到另一个系统,或者对单元或组件中的一些特征或功能进行选择性地禁用。就不同单元或组件之间的连接关系而言,前文结合附图所讨论的连接可以是单元或组件之间的直接或间接耦合。在一些场景中,前述的直接或间接耦合涉及利用接口的通信连接,其中通信接口可以支持电性、光学、声学、磁性或其它形式的信号传输。
在本披露中,作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元示出的部件可以是或者也可以不是物理单元。前述部件或单元可以位于同一位置或者分布到多个网络单元上。另外,根据实际的需要,可以选择其中的部分或者全部单元来实现本披露实施例该方案的目的。另外,在一些场景中,本披露实施例中的多个单元可以集成于一个单元中或者各个单元物理上单独存在。
在另外一些实现场景中,上述集成的单元也可以采用硬件的形式实现,即为具体的硬件电路,其可以包括数字电路和/或模拟电路等。电路的硬件结构的物理实现可以包括但不限于物理器件,而物理器件可以包括但不限于晶体管或忆阻器等器件。鉴于此,本文该的各类装置(例如计算装置或其他处理装置)可以通过适当的硬件处理器来实现,例如中央处理器、GPU、FPGA、DSP和ASIC等。进一步,前述的该存储单元或存储装置可以是任意适当的存储介质(包括磁存储介质或磁光存储介质等)。
依据以下条款可更好地理解前述内容:
条款A1、一种数据类型选择方法,包括:统计目标网络层中目标数据的指数分布信息,该指数分布信息包括指数分布范围及指数值分布量;根据该指数分布范围设定该目标数据的指数位宽;根据该指数值分布量设定该目标数据的尾数位宽;根据该指数位宽和该尾数位宽决定该目标数据的数据类型。
条款A2、根据条款A1所述的方法,其中设定该目标数据的指数位宽的步骤包括:计算该指数分布范围与第一超参数的乘积;对该乘积进行取整;其中,取整后的结果为该指数位宽。
条款A3、根据条款A1或A2的方法,其中该指数分布范围为该目标数据中,指数位最大值减去指数位最小值。
条款A4、根据条款A1的方法,其中设定该目标数据的指数位宽的步骤包括:识别所述指数分布范围中介于分布占比阈值内的指数分布区间;其中,以所述指数分布区间对应的位宽值为所述指数位宽。
条款A5、根据条款A1所述的方法,其中设定该目标数据的尾数位宽的步骤包括:识别具有最大分布数的指数位数;计算该最大分布数与总分布数的分布数比值;计算该分布数比值与第二超参数的乘积;对该乘积进行取整;其中,取整后的结果为该尾数位宽。
条款A6、根据条款A2或A5所述的方法,其中取整步骤为向上取整。
条款A7、根据条款A1所述的方法,其中决定该目标数据的数据类型的步骤包括:加总符号位宽、该指数位宽和该尾数位宽,以得到该目标数据的总位宽。
条款A8、根据条款A7所述的方法,其中目标网络层可接受多个数据类型,分别对应不同的位宽数值区间,该决定该目标数据的数据类型的步骤包括:识别该总位宽所在的位宽数值区间,以选择该多个数据类型中相应的数据类型。
条款A9、一种计算机可读存储介质,其上存储有数据类型选择方法的计算机程序代码,当该计算机程序代码由处理装置运行时,执行条款A1至A8任一项所述的方法。
条款A10、一种计算机程序产品,包括数据类型选择方法的计算机程序,该计算机程序被处理器执行时实现条款A1至A8任一项所述方法的步骤。
条款A11、一种计算机装置,包括存储器、处理器及存储在存储器上的计算机程序,该处理器执行该计算机程序以实现条款A1至A8任一项所述方法的步骤。
条款A12、一种数据类型选择装置,包括:统计模块,用以统计目标网络层中目标数据的指数分布信息,该指数分布信息包括指数分布范围及指数值分布量;设定模块,用以:根据该指数分布范围设定该目标数据的指数位宽;根据该指数值分布量设定该目标数据的尾数位宽;决定模块,用以根据该指数位宽和该尾数位宽决定该目标数据对应的数据类型。
以上对本披露实施例进行了详细介绍,本文中应用了具体个例对本披露的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本披露的方法及其核心思想;同时,对于本领域的一般技术人员,依据本披露的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本披露的限制。

Claims (12)

  1. 一种数据类型选择方法,其特征在于,包括:
    统计目标网络层中目标数据的指数分布信息,所述指数分布信息包括指数分布范围及指数值分布量;
    根据所述指数分布范围设定所述目标数据的指数位宽;
    根据所述指数值分布量设定所述目标数据的尾数位宽;
    根据所述指数位宽和所述尾数位宽决定所述目标数据的数据类型。
  2. 根据权利要求1所述的方法,其中所述设定所述目标数据的指数位宽的步骤包括:
    计算所述指数分布范围与第一超参数的乘积;
    对所述乘积进行取整;
    其中,取整后的结果为所述指数位宽。
  3. 根据权利要求1或2所述的方法,其中所述指数分布范围为所述目标数据中,指数位最大值减去指数位最小值。
  4. 根据权利要求1所述的方法,其中所述设定所述目标数据的指数位宽的步骤包括:
    识别所述指数分布范围中介于分布占比阈值内的指数分布区间;
    其中,以所述指数分布区间对应的位宽值为所述指数位宽。
  5. 根据权利要求1所述的方法,其中所述设定所述目标数据的尾数位宽的步骤包括:
    识别具有最大分布数的指数位数;
    计算所述最大分布数与总分布数的分布数比值;
    计算所述分布数比值与第二超参数的乘积;
    对所述乘积进行取整;
    其中,取整后的结果为所述尾数位宽。
  6. 根据权利要求2或5所述的方法,其中所述取整步骤为向上取整。
  7. 根据权利要求1所述的方法,其中所述决定所述目标数据的数据类型的步骤包括:
    加总符号位宽、所述指数位宽和所述尾数位宽,以得到所述目标数据的总位宽。
  8. 根据权利要求7所述的方法,其中所述目标网络层可接受多个数据类型,分别对应不同的位宽数值区间,所述决定所述目标数据的数据类型的步骤包括:
    识别所述总位宽所在的位宽数值区间,以选择所述多个数据类型中相应的数据类型。
  9. 一种计算机可读存储介质,其上存储有数据类型选择方法的计算机程序代码,当所述计算机程序代码由处理装置运行时,执行权利要求1至8任一项所述的方法。
  10. 一种计算机程序产品,包括数据类型选择方法的计算机程序,其特征在于,所述计算机程序被处理器执行时实现权利要求1至8任一项所述方法的步骤。
  11. 一种计算机装置,包括存储器、处理器及存储在存储器上的计算机程序,其特征在于,所述处理器执行所述计算机程序以实现权利要求1至8任一项所述方法的步骤。
  12. 一种数据类型选择装置,其特征在于,包括:
    统计模块,用以统计目标网络层中目标数据的指数分布信息,所述指数分布信息包括指数分布范围及指数值分布量;
    设定模块,用以:
    根据所述指数分布范围设定所述目标数据的指数位宽;
    根据所述指数值分布量设定所述目标数据的尾数位宽;
    决定模块,用以根据所述指数位宽和所述尾数位宽决定所述目标数据对应的数据类型。
PCT/CN2023/110621 2022-08-26 2023-08-01 数据类型选择方法、装置及可读存储介质 WO2024041332A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211032359.8 2022-08-26
CN202211032359.8A CN117688993A (zh) 2022-08-26 2022-08-26 数据类型选择方法、装置及可读存储介质

Publications (1)

Publication Number Publication Date
WO2024041332A1 true WO2024041332A1 (zh) 2024-02-29

Family

ID=90012426

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/110621 WO2024041332A1 (zh) 2022-08-26 2023-08-01 数据类型选择方法、装置及可读存储介质

Country Status (2)

Country Link
CN (1) CN117688993A (zh)
WO (1) WO2024041332A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106570559A (zh) * 2015-10-09 2017-04-19 阿里巴巴集团控股有限公司 一种基于神经网络的数据处理方法和装置
US20190294964A1 (en) * 2018-03-20 2019-09-26 National Institute Of Advanced Industrial Science And Technology Computing system
US20200050429A1 (en) * 2018-08-07 2020-02-13 NovuMind Limited Method and system for elastic precision enhancement using dynamic shifting in neural networks
CN110889503A (zh) * 2019-11-26 2020-03-17 中科寒武纪科技股份有限公司 数据处理方法、装置、计算机设备和存储介质
CN112836806A (zh) * 2021-02-26 2021-05-25 上海阵量智能科技有限公司 一种数据格式调整方法、装置、计算机设备和存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106570559A (zh) * 2015-10-09 2017-04-19 阿里巴巴集团控股有限公司 一种基于神经网络的数据处理方法和装置
US20190294964A1 (en) * 2018-03-20 2019-09-26 National Institute Of Advanced Industrial Science And Technology Computing system
US20200050429A1 (en) * 2018-08-07 2020-02-13 NovuMind Limited Method and system for elastic precision enhancement using dynamic shifting in neural networks
CN110889503A (zh) * 2019-11-26 2020-03-17 中科寒武纪科技股份有限公司 数据处理方法、装置、计算机设备和存储介质
CN112836806A (zh) * 2021-02-26 2021-05-25 上海阵量智能科技有限公司 一种数据格式调整方法、装置、计算机设备和存储介质

Also Published As

Publication number Publication date
CN117688993A (zh) 2024-03-12

Similar Documents

Publication Publication Date Title
US11249721B2 (en) Multiplication circuit, system on chip, and electronic device
Liu et al. Design and analysis of inexact floating-point adders
CN111581593B (zh) 可配置重用的分段式查找表激活函数实现装置
CN105844330A (zh) 神经网络处理器的数据处理方法及神经网络处理器
CN108229648B (zh) 匹配存储器中数据位宽的卷积计算方法和装置、设备、介质
KR20080027454A (ko) 명령어에 응답하여 라운딩 연산을 수행하는 방법, 장치, 시스템 및 머신-판독가능 매체
US11934788B2 (en) Encoding method, apparatus, and storage medium
GB2586559A (en) Enhanced low precision binary floating-point formatting
CN108196822A (zh) 一种双精度浮点开方运算的方法及系统
WO2023040389A1 (zh) 转数方法、存储介质、装置及板卡
WO2024120249A1 (zh) 数据处理方法、装置、设备及存储介质
WO2022057502A1 (zh) 点积运算实现方法、装置、电子设备及存储介质
WO2024041332A1 (zh) 数据类型选择方法、装置及可读存储介质
Cody Analysis of proposals for the floating-point standard
CN111930673A (zh) 异构智能处理量化装置、量化方法、电子设备及存储介质
CN116700664B (zh) 一种确定浮点数平方根的方法及装置
JP2016201108A (ja) 数学的関数を計算するためのシステム及び方法
US20220113943A1 (en) Method for multiply-add operations for neural network
CN111930670B (zh) 异构智能处理量化装置、量化方法、电子设备及存储介质
CN109558109B (zh) 数据运算装置及相关产品
CN114385540A (zh) 一种数据单位换算方法及装置
CN109408028B (zh) 浮点数运算方法、装置及存储介质
CN114692865A (zh) 一种神经网络量化训练方法、装置及相关产品
CN115238236A (zh) 数据处理方法、装置、电子设备、介质和芯片
WO2023231363A1 (zh) 乘累加操作数的方法及其设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23856422

Country of ref document: EP

Kind code of ref document: A1