WO2022198652A1 - Random number generation apparatus and method, random number generation system, and chip - Google Patents

Random number generation apparatus and method, random number generation system, and chip Download PDF

Info

Publication number
WO2022198652A1
WO2022198652A1 PCT/CN2021/083344 CN2021083344W WO2022198652A1 WO 2022198652 A1 WO2022198652 A1 WO 2022198652A1 CN 2021083344 W CN2021083344 W CN 2021083344W WO 2022198652 A1 WO2022198652 A1 WO 2022198652A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
random number
output
generator
subcircuit
Prior art date
Application number
PCT/CN2021/083344
Other languages
French (fr)
Chinese (zh)
Inventor
朱幸尔
郑乔石
张精制
杨方昱
李克
丰帆
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to CN202180087448.XA priority Critical patent/CN116710890A/en
Priority to PCT/CN2021/083344 priority patent/WO2022198652A1/en
Publication of WO2022198652A1 publication Critical patent/WO2022198652A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/58Random or pseudo-random number generators

Definitions

  • the first pipeline includes a plurality of cascaded operation subcircuits; the operation subcircuit includes a plurality of parallel data deformation modules; the data deformation module is used to receive the first data, the second data, the third data, the first design For the fixed value and the second set value, the lower bit of the product of the first data and the first set value is used as the first output data; the result of XORing the second data and the third data, and, the first data and the first data The high bit of the product of a set value is XORed as the second output data; the data deformation module is also used to use the sum of the third data and the second set value as the third output data; or, except the data in the last stage In addition to the deformation module, the data deformation module in each stage is also used to use the sum of the third data and the second set value as the third output data; the first output data, the second output data and the first output data of the data deformation module in the previous stage The three output data are respectively used as the second data, first data and
  • the parity selection module includes a second AND gate, a first modulo, a third AND, a first OR, and a second modulo;
  • the second AND gate is used to receive the fifth data and the first Six set values, perform AND operation on the fifth data and the sixth set value;
  • the first modulo is used to receive the sixth data, and the sixth data is modulo;
  • the third AND gate is used to receive the first modulo output and the seventh set value, perform AND operation on the output of the first modulo taker and the seventh set value;
  • the first OR gate is used to receive the output of the second AND gate and the output of the third AND gate, and the The output of the second AND gate is ORed with the output of the third AND gate;
  • the second modulo is used to receive the output of the first OR gate, modulo the output of the first OR gate, and take the result of the modulo Output to the second selector and even generation block.
  • the even number generation module includes a third modulo extractor, a second right shifter and a fifth XOR; the third modulo extractor is used to receive the seventh data, and the seventh data is modulo taken; the second right shifter is used for Receive the output of the parity selection module, and shift the output of the parity selection module to the right by b bits; the fifth XOR is used to receive the output of the third modulo extractor and the output of the second right shifter, and the output of the third modulo extractor and the output of the second right shifter are received.
  • the output of the second right shifter is XORed, and the XORed result is output to the second selector.
  • the output of the six XORs shifts the output of the sixth XOR to the left by d bits;
  • the fourth AND gate device is used to receive the ninth set value and the output of the first left shifter, and the ninth set value and the first The output of the left shifter is ANDed;
  • the seventh XOR is used to receive the output of the sixth XOR and the output of the fourth AND gate, and XOR the output of the sixth XOR with the output of the fourth AND gate. OR operation;
  • the second left shifter is used to receive the output of the seventh XOR, and the output of the seventh XOR is left shifted by e;
  • the fifth AND gate device is used to receive the tenth set value and the second left shifter.
  • the random number generating apparatus further includes: a data type conversion module, configured to perform data type conversion on the random numbers generated by at least one generator. Since the random number generated by the generator is a random number of a fixed data type, it is only applicable to a network training device for training random numbers of a specific data type, and the applicable scope of the random number generating device is relatively limited. By arranging a data type conversion module in the random number generating device, the data type of the random number generated by the generator can be converted so as to be applicable to different network training devices, and the applicable scope of the random number generating device can be improved.
  • the data type conversion module includes at least one data type converter; the data type converter is used to convert the random number generated by at least one generator into a random number of a preset data type; the data type conversion module includes a plurality of data types.
  • the data type conversion module further includes a fourth selector, and the fourth selector is used to select and output the result of one of the multiple data type converters according to the second parameter; wherein, the multiple data type conversion The preset data types of the random numbers obtained by the converter are different.
  • the distribution conversion module includes at least one distribution generator; the distribution generator is used to convert the random numbers output by the data type conversion module into random numbers that obey a preset distribution; when the distribution conversion module includes multiple distribution generators , the random number generating device further includes a fifth selector, and the fifth selector is used for selecting and outputting the result of one of the multiple distribution generators according to the third parameter; wherein, the random numbers converted by the multiple distribution generators obey the The preset distribution types are different.
  • the at least one distribution generator includes a normal distributor, and the normal distributor uses a box-muller algorithm to convert the random numbers output by the data type conversion module into random numbers that obey a normal distribution.
  • the normal distribution generator adopts a hardened box-muller structure.
  • the box-muller algorithm is the normal distribution transformation algorithm used in the current mainstream digital deep learning framework.
  • the normal distribution characteristics of the normally distributed random numbers formed by the box-muller algorithm are relatively
  • the normal distribution characteristics of the simulated normal distribution random numbers formed by the irwin-hall algorithm are better.
  • the box-muller algorithm is used, the input is two uint32 random numbers that satisfy the uniform distribution, and the output is two fp32 random numbers that satisfy the standard normal distribution.
  • the input of multiple uniformly distributed random numbers is added to generate a normally distributed random number.
  • the irwin-hall algorithm has a larger requirement on the number of uniformly distributed random numbers.
  • the box-muller algorithm has lower performance requirements on the generator under the condition of satisfying the same amount of data output.
  • a chip including a substrate and any random number generating device of the first aspect, wherein the random number generating device is disposed on the substrate.
  • FIG. 2b is a frame diagram of another random number generating apparatus provided by an embodiment of the present application.
  • 2d is a frame diagram of another random number generating apparatus provided by an embodiment of the present application.
  • FIG. 11 is a schematic structural diagram of another second generator provided by an embodiment of the present application.
  • FIG. 14b is a diagram of a usage scenario of another random number generating apparatus provided by an embodiment of the present application.
  • AI Artificial intelligence
  • Various AI technologies are currently widely used in machine vision, image recognition, face recognition, object detection, intelligent driving, speech recognition, natural language processing, machine translation, speech generation, and text-to-speech.
  • the distribution conversion module 50 performs data type conversion on the random numbers output by the data type conversion module 40 .
  • the distribution type is converted and then output.
  • the random number generating apparatus 100 provided by the embodiments of the present application will be described with several detailed examples.
  • the random number generated by the first generator 10 is referred to as the first random number
  • the random number generated by the second generator 20 is referred to as the second random number .
  • the cascade connection of a plurality of operation sub-circuits 111 is realized by the cascade connection of a plurality of data deformation modules 112 in the operation sub-circuit 111.
  • the data transformation module 112 is connected to the plurality of data transformation modules 112 in the subsequent stage operation sub-circuit 111 in a one-to-one correspondence.
  • the first multiplier is used for receiving the first data and the first set value, multiplying the first data and the first set value, and outputting the lower bit of the obtained product as the first output data.
  • the third data key' and the third data key" of the first-stage operation sub-circuit 111 may be provided by the third flash memory.
  • the first stage operation subcircuit 111 starts to process the second counter_start data (counter_start[0]). 16]) to perform the operation.
  • the first data counter[1]" received by the first-level second data deformation module 112 is the third first random number among the four first random numbers to be generated by the first pipeline11"
  • the index of the number in the random number chain (32bit data).
  • the first data counter[1]' received by the second data transformation module 112'' of the other stage is the second output data result[2]' of the first data transformation module 112' of the previous stage.
  • the second data counter[2]" received by the first-level second data deformation module 112 is the fourth first random number among the four first random numbers to be generated by the first pipeline11"
  • the index of the number in the random number chain (32bit data).
  • the second data counter[2]' received by the second data transformation module 112'' of the other stage is the first output data result[1]' of the first data transformation module 112' of the previous stage.
  • the third data key "received by the first-level second data deformation module 112" is the lower 32 bits of the random number seed key.
  • the third data key" received by the second data transformation module 112" of the other stage is the third output data result[3]" of the second data transformation module 112" of the previous stage.
  • the first set value and the second set value received by the first data deformation module 112 ′ are the same, and the first set value and the second set value received by the second data deformation module 112 ′′ are the same.
  • the first data deformation module The first set value and the second set value received by the module 112 ′ are different from the first set value and the second set value received by the second data transformation module 112 ′′.
  • the data type of the first random number is uint32
  • the fp32 converter according to the IEEE754 standard, generates a floating-point number in the range of 1 to 2 by reserving the lower 23 bits of uint32 and filling the upper bits with 9 bits (001111111), and then subtracting 1 to realize the The data type of the first random number is converted from uint32 to fp32.
  • the data type converter 41 is a 32-bit unsigned integer (uint32) converter.
  • the uint32 converter is used to convert the first random number to a random number of 32-bit unsigned integer type.
  • the uint32 converter plays a role in transmitting the first random number.
  • the data type converter 41 is a 32-bit signed integer (int32) converter.
  • the int32 converter is used to convert the first random number to a random number of 32-bit signed integer type.
  • the data type converter 41 is a 64-bit signed integer (int64) converter.
  • the int64 converter is used to convert the first random number to a random number of 64-bit signed integer type.
  • the second parameter can be transmitted to the first flash memory through the system scheduler, CPU, GPU or NPU, and the first flash memory is transmitted to the fourth selector 42 .
  • the distribution conversion module 50 in the random number generation device 100 is configured to perform distribution type conversion on the random numbers output by the data type conversion module 40 .
  • the distribution conversion module 50 includes a distribution generator 51, and the distribution generator 51 is used to convert the random numbers output by the data type conversion module 40 into random numbers that obey a preset distribution. number.
  • the distribution generator 51 is a bitmask gen.
  • the bit mask distributor is used to convert the random numbers output by the data type conversion module 40 into random numbers that obey the mask distribution.
  • the normal distribution generator can be, for example, a normal distribution generator of any mean and variance or a truncated normal distribution generator.
  • the normal distributor may use the box-muller algorithm to generate normally distributed random numbers subject to any mean and variance.
  • the uniform distributor is equivalent to transmitting the random number output by the data type conversion module 40 .
  • the distribution generator 51 is configured to convert the random numbers output by the data type conversion module 40 into random numbers obeying a preset distribution.
  • the random numbers converted by the multiple distribution generators 51 obey different preset distribution types.
  • the distribution generator 51 may be, for example, the above-mentioned bitmask distributor, normal distributor, or uniform distributor.
  • the fifth selector 52 is configured to select and output the random number generated by one of the plurality of distribution generators 51 according to the third parameter.
  • the first random number generated by the first generator 10 is a random number of a fixed distribution type, it is only suitable for a network training device that trains random numbers of a certain distribution type, and the scope of application of the random number generating device 100 is relatively limited. big.
  • the distribution conversion module 50 By disposing the distribution conversion module 50 in the random number generating apparatus 100, the distribution type of the first random number generated by the first generator 10 can be converted so as to be applicable to different network training apparatuses, and the applicability of the random number generating apparatus 100 can be improved. scope.
  • the interrupt request includes a random number generation complete normal interrupt and a random number generation incomplete abnormal interrupt.
  • the interrupt management module transmits the interrupt request to the first flash memory, the first flash memory transmits the normal interrupt request to the processor (such as the system scheduler, CPU, GPU or NPU) of the neural network system through the first transmission line ioc, and the first flash memory transmits the normal interrupt request through the first transmission line ioc.
  • the second transmission line ioe transmits the abnormal interrupt request to the processor of the neural network system.
  • the processor outputs corresponding control instructions according to the received interrupt request type.
  • the random number generation device 100 in the embodiment of the present application is provided with a flash memory configuration interface, the hardened transfer module, CPU, GPU or NPU configures the first flash memory, and transmits the second parameter and the third parameter respectively through the first flash memory to the fourth selector 42 and the fifth selector 52 .
  • the software and hardware interfaces of the random number generating device 100 are flexible, and the working mode and parameters can be flexibly set.
  • the random number generation process does not require the intervention of the system scheduler, CPU, GPU or NPU, reducing the need for the system scheduler, CPU, GPU or NPU. operating pressure.
  • the first generator 10 includes multiple parallel first pipelines 11 , the multiple parallel first pipelines 11 can simultaneously form multiple first random numbers, and the number of parallel first pipelines 11 can be expanded as required.
  • the parallelism of the first generator 10 in the random number generating apparatus 100 provided by the embodiment of the present application is good, and the generation efficiency of the random number can be significantly improved.
  • the generation of the first random number is completed by the hardware structure of the first pipeline11, and the performance of the random number generation device 100 can meet the requirements of the hardware pipeline structure in the large-scale neural network training scenario.
  • Example 2 The difference between Example 2 and Example 1 is that the random number generating apparatus 100 includes a second generator 20 .
  • This process of generating initial values is repeated to obtain a rotation chain including a plurality of initial values.
  • the third XOR is used for receiving the output of the first right shifter, and performing an XOR operation on the output of the first right shifter and the fourth data.
  • the third AND gate device is used to receive the seventh set value, and perform AND operation on the output of the first modulo extractor and the seventh set value.
  • the second selector 2214 is configured to select and output the output of the odd number generation module 2212 or the even number generation module 2213 according to the output of the parity selection module 2211, so as to rotate the rotation chain.
  • the sixth XOR in the drawings of the embodiments of the present application, the sixth "xor” is used to indicate the sixth XOR
  • the first left shifter in the present application In the drawings of the embodiments, the first " ⁇ ” is used to represent the first left shifter
  • the fourth AND gate the fourth "&” is used to represent the fourth AND gate in the drawings of the embodiments of the present application
  • the seventh XOR in the drawings of the embodiments of the present application, the seventh XOR is represented by the seventh "xor"
  • the second left shifter in the drawings of the embodiments of the present application, the second " ⁇ "represents the second left shifter
  • the fifth AND gate in the drawings of the embodiments of this application, the fifth "&” is used to represent the fifth AND gate
  • the eighth XOR in the In the drawings, the eighth "x
  • the third right shifter is used for receiving the output of the state rotation sub-circuit, and right-shifting the output of the state rotation sub-circuit by c bits.
  • the sixth XOR is used to receive the output of the state rotation subcircuit and the output of the third right shifter, and to perform an XOR operation on the output of the state rotation subcircuit and the output of the third right shifter.
  • the fourth AND gate is used to receive the ninth set value and the output of the first left shifter, and perform AND operation on the ninth set value and the output of the first left shifter.
  • the fifth AND gate is used to receive the tenth set value and the output of the second left shifter, and perform AND operation on the tenth set value and the output of the second left shifter.
  • the eighth XOR is used for receiving the output of the fifth AND gate and the output of the seventh XOR, and performing XOR operation on the output of the fifth AND gate and the output of the seventh XOR.
  • the fourth right shifter is used for receiving the output of the eighth XOR, and right-shifting the output of the eighth XOR by f bits.
  • the ninth XOR is used to receive the output of the eighth XOR and the output of the fourth right shifter, and perform XOR operation on the output of the eighth XOR and the output of the fourth right shifter, and use the XOR result as the first XOR. Two random number outputs.
  • the pre-rotation rotation chain output by the seed initialization generating sub-circuit 21 and the rotated rotation chain output by the state rotation sub-circuit 22 are respectively transmitted to the third selector 24 .
  • the interworking register 25 is used to receive the rotation chain output by the third selector 24 .
  • the intercommunication register 25 receives the rotation chain before rotation.
  • the control signal output by the selection control terminal control is to control the third selector 24 to output the updated value output by the state rotation sub-circuit 22
  • the intercommunication register 25 receives the rotated rotation chain.
  • the intercommunication register receives the update value every 25 times, and needs to store the received update value.
  • the intercommunication register 25 is used to transmit the received rotary chain to the first memory 26 , and the first memory 26 receives and stores the rotary chain output by the intercommunication register 25 .
  • the first memory 26 may be, for example, a static random-access memory (static random-access memory, SRAM).
  • SRAM static random-access memory
  • the third selector 24 under the control of the control signal output by the selection control terminal control, firstly initializes the rotation including multiple initial values generated by the seed initialization generating sub-circuit 21
  • the chain is transferred to the interworking register 25 where it is stored in the first memory 26 .
  • the output of the state rotation subcircuit 22 is transmitted to the third selector 24 , and the third selector 24 transmits the updated value output by the state rotation subcircuit 22 to the interconnection register 25 under the control of the control signal output by the selection control terminal control.
  • the intercommunication register 25 transmits the updated value output by the state rotation sub-circuit 22 to the first memory 26 to update the rotation chain in the first memory 26 . Continuous cycle, constantly updating the rotating chain.
  • the rotation chain is stored in the first memory 26 , and when the state rotation subcircuit 22 rotates the rotation chain, the intercommunication register 25 transmits the rotation chain to each second pipeline 221 in the state rotation subcircuit 22 . , as the fifth data, sixth data and seventh data.
  • the intercommunication register 25 retrieves data in the rotation chain in batches from the first memory 26 , which means that the state rotation subcircuit 22 cycles once, and the intercommunication register 25 retrieves a batch of data in the rotation chain from the first memory 26 .
  • the received data has a different index in the rotation chain, that is, the received data is different data in the rotation chain.
  • the state rotation subcircuit 22 performs rotation processing on the initial value of the first round, and outputs the updated value of the first round.
  • the output subcircuit 23 performs deformation processing on the updated value of the first round, and outputs a second random number.
  • the state rotation subcircuit 22 performs rotation processing on the initial value of the second round, and outputs the updated value of the second round.
  • the process of rotating the rotation chain and the process of processing the update value are performed synchronously, which can improve the generation efficiency of the second random number.
  • the signal transmission between the seed initialization generation sub-circuit 21 and the state rotation sub-circuit 22, and the signal transmission between the state rotation sub-circuit 22 and the output sub-circuit 23 are realized through the intercommunication register 26, by sharing the same intercommunication register 25 , the number of intercommunication registers 25 can be reduced.
  • the second generator 20 is configured to synchronously generate multiple second generators according to random number seeds based on a Mersenne twister (MT) algorithm. random number.
  • MT Mersenne twister
  • the implementation of the Mersenne rotation algorithm may be the MT19937 algorithm.
  • the second generator 20 includes a seed initialization generation subcircuit 21 , a state rotation subcircuit 22 , an output subcircuit 23 , a third selector 24 , an interworking register 25 and a first memory 26 .
  • the third XOR performs XOR operation on the output of the first right shifter and the random number seed key.
  • the second adder adds the output of the second multiplier to s.
  • the first initial value has a bit flag of 0.
  • the state rotation sub-circuit 22 includes 16 parallel second pipelines 221, and each second pipeline 221 receives the data in the rotation chain retrieved by the interworking register 25 as the fifth data, the sixth data and the seventh data.
  • the fifth data is the data with the bit label i in the rotation chain
  • the sixth data is the data with the bit label i+1 in the rotation chain
  • the seventh data is the bit label in the rotation chain is (i+397) Data for the remainder of /624.
  • the values of i in the fifth data, the sixth data and the seventh data received by the plurality of parallel second pipelines 221 are different.
  • the fifth data, sixth data and seventh data received by one second pipeline 221 are the data marked 0, 1, and 397 in the rotation chain in sequence.
  • the fifth data, the sixth data and the seventh data received by the second pipeline 221 are the data marked 1, 2, and 398 in the rotation chain in sequence.
  • the fifth data, the sixth data and the seventh data received by the second pipeline 221 are the data marked 2, 3, and 399 in the rotation chain in sequence.
  • the fifth data, the sixth data and the seventh data received by the second pipeline 221 are the data marked 3, 4, and 400 in the rotation chain in sequence.
  • the third modulo takes the modulo of the data in the rotation chain marked with the remainder of (i+397)/624.
  • the fifth XOR performs an XOR operation on the output of the third modulo and the output of the second right shifter, and outputs the XOR result to the second selector 2214 .
  • the fifth data, sixth data and seventh data of the fifth second pipeline221 correspond to the data marked 4, 5, and 401 in the rotation chain bit in turn.
  • the fifth data, sixth data and seventh data of the sixth second pipeline 221 correspond to the data marked 5, 6, and 402 in the rotation chain bit in turn.
  • the fifth data, sixth data, and seventh data of the ninth second pipeline221 correspond to the data with the bit labels 8, 9, and 405 respectively.
  • the eighth XOR performs an XOR operation on the output of the fifth AND gate and the output of the seventh XOR.
  • the ninth XOR performs an XOR operation on the output of the fourth right shifter and the output of the eighth XOR, and outputs the XOR result as a second random number.
  • the structures of the data type conversion module 40 , the distribution conversion module 50 , the output control module 60 , and the interrupt management module in the random number generation device 100 may be the same as those in Example 1, and the relevant description in Example 1 can be referred to, It will not be repeated here.
  • the structure of the second generator 20 may be the same as the structure of the second generator 20 in the second example, and reference may be made to the relevant description in the second example, which will not be repeated here.
  • the data type conversion module 40 is configured to perform data type conversion on the first random number.
  • the data type conversion module 40 is configured to perform data type conversion on the second random number.

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Complex Calculations (AREA)

Abstract

Embodiments of the present application relate to the technical field of neural networks, and provide a random number generation apparatus and method, a random number generation system, and a chip, used for solving the problem of low efficiency of random number generation due to the use of a software program. The random number generation apparatus comprises at least one generator. The at least one generator contains a generator for synchronously generating a plurality of random numbers according to random number seeds. When the at least one generator is a plurality of generators, the random number generation apparatus further comprises a first selector for selecting, according to a first parameter, a random number generated by one of the plurality of generators.

Description

随机数生成装置及生成方法、随机数生成系统、芯片Random number generating device and generating method, random number generating system, chip 技术领域technical field
本申请涉及神经网络技术领域,尤其涉及一种随机数生成装置及生成方法、随机数生成系统、芯片。The present application relates to the technical field of neural networks, and in particular, to a random number generation device and generation method, a random number generation system, and a chip.
背景技术Background technique
人工智能(artificial intelligence,AI)技术已经在社会生活和生产方面有着广泛的应用,同时也是未来技术和产品的发展趋势。各种各样的AI技术目前广泛的应用在机器视觉、图像识别、人脸识别、对象侦测、智能驾驶、语音识别、自然语言处理、机器翻译、语音生成和文本转换语音等领域。Artificial intelligence (AI) technology has been widely used in social life and production, and it is also the development trend of future technologies and products. Various AI technologies are currently widely used in machine vision, image recognition, face recognition, object detection, intelligent driving, speech recognition, natural language processing, machine translation, speech generation, and text-to-speech.
AI技术的核心是神经网络系统,而神经网络系统正常运行的关键之一是随机性,无论是神经网络推理还是神经网络训练,都需要高效的随机数生成,因而随机数生成是神经网络系统中的重要环节。The core of AI technology is the neural network system, and one of the keys to the normal operation of the neural network system is randomness. Whether it is neural network inference or neural network training, efficient random number generation is required. important part.
相关技术中通常采用中央处理器(central processing unit,CPU)、图形处理器(graphics processing unit,GPU)或者神经网络处理器(neural network processing unit,NPU)控制软件程序(取指令或者运算)的方式生成随机数,采用这种方式生成随机数的过程中,在上一个随机数生成后,才能循环一次生成下一个随机数,效率低下。由于当前神经网络系统规模越来越大,需要的随机数的数据量也巨大,通过CPU、GPU或者NPU控制生成随机数已经无法满足当下神经网络系统的需求,成为神经网络系统训练性能的瓶颈之一。In related technologies, a central processing unit (CPU), a graphics processing unit (GPU), or a neural network processing unit (NPU) are usually used to control a software program (instruction fetch or operation). To generate random numbers, in the process of generating random numbers in this way, after the previous random number is generated, the next random number can be generated in a cycle, which is inefficient. Due to the increasing scale of the current neural network system, the amount of random number data required is also huge, and the random number generated by the control of CPU, GPU or NPU can no longer meet the needs of the current neural network system, becoming one of the bottlenecks of the training performance of the neural network system. one.
发明内容SUMMARY OF THE INVENTION
本申请实施例提供一种随机数生成装置及生成方法、随机数生成系统、芯片,用于解决采用CPU、GPU或者NPU控制软件程序的方式生成随机数效率低下的问题。Embodiments of the present application provide a random number generating device and method, a random number generating system, and a chip, which are used to solve the problem of low efficiency in generating random numbers by using a CPU, GPU, or NPU to control a software program.
为达到上述目的,本申请采用如下技术方案:To achieve the above object, the application adopts the following technical solutions:
本申请实施例的第一方面,提供一种随机数生成装置,包括:至少一个生成器,至少一个生成器中含有生成器用于根据随机数种子同步生成多个随机数;在至少一个生成器为多个生成器的情况下,随机数生成装置还包括第一选择器;第一选择器用于根据第一参数,选择输出多个生成器中的一个生成的随机数。A first aspect of the embodiments of the present application provides an apparatus for generating random numbers, including: at least one generator, and at least one generator includes a generator for synchronously generating multiple random numbers according to random number seeds; In the case of multiple generators, the random number generating apparatus further includes a first selector; the first selector is configured to select and output the random number generated by one of the multiple generators according to the first parameter.
本申请实施例提供的随机数生成装置,包括有生成器可根据随机数种子在同一时刻同步生成多个随机数。也就是说,无需等待上一个随机数生成后,再开始生成下一个随机数。相比于相关技术中采用CPU、GPU或者NPU控制软件程序生成随机数的方式(一次生成一个随机数,生成上一个随机数后,才能生成下一个随机数),本申请实施例提供的随机数生成装置并行度好(可同步形成多个随机数),可明显提升随机数的生成效率,提高随机数生成装置的吞吐量。另外,随机数是由生成器这些硬件结构来生成的,CPU、GPU或者NPU等提供随机数种子等参数后,无需CPU、GPU 或者NPU再干预(取指令或者运算),可减轻CPU、GPU或者NPU的运行压力。再者,生成器可同步生成多个随机数,同步生成多个随机数的具体数量可以调整,例如同步生成的随机数的数量可以调到比较多,生成器的扩展性好。The random number generating device provided by the embodiment of the present application includes a generator that can synchronously generate multiple random numbers at the same time according to the random number seed. That is, there is no need to wait for the previous random number to be generated before starting to generate the next random number. Compared with the method in the related art in which a CPU, GPU or NPU is used to control a software program to generate random numbers (one random number is generated at a time, and the next random number can only be generated after the previous random number is generated), the random number provided by the embodiment of the present application The generation device has good parallelism (multiple random numbers can be formed synchronously), which can significantly improve the generation efficiency of random numbers and improve the throughput of the random number generation device. In addition, random numbers are generated by hardware structures such as generators. After CPU, GPU or NPU provides parameters such as random number seeds, there is no need for CPU, GPU or NPU to intervene (instruction fetch or operation), which can reduce CPU, GPU or The operating pressure of the NPU. Furthermore, the generator can synchronously generate multiple random numbers, and the specific number of synchronously generated multiple random numbers can be adjusted. For example, the number of synchronously generated random numbers can be adjusted to a larger number, and the generator has good scalability.
可选的,至少一个生成器包括第一生成器,第一生成器包括多条并行的第一流水线(pipeline);多条并行的第一pipeline用于根据随机数种子同步生成多个随机数。第一生成器包括多条并行的第一pipeline,多条并行的第一pipeline可同步形成多个第一随机数,且并行的第一pipeline的数量可以根据需要扩展。另外,第一随机数生成过程中,下一个第一随机数的生成开始,无需等待上一个第一随机数的生成结束。因此,本申请实施例提供的随机数生成装置中第一生成器的并行度好,可明显提升随机数的生成效率。且第一随机数的生成是由第一pipeline这个硬件结构来完成的,随机数生成装置的性能可以满足大规模神经网络训练场景下对硬件流水线结构的需求。Optionally, at least one generator includes a first generator, and the first generator includes multiple parallel first pipelines; the multiple parallel first pipelines are used for synchronously generating multiple random numbers according to random number seeds. The first generator includes multiple parallel first pipelines, the multiple parallel first pipelines can synchronously form multiple first random numbers, and the number of parallel first pipelines can be expanded as required. In addition, during the generation of the first random number, the generation of the next first random number starts, and there is no need to wait for the end of the generation of the previous first random number. Therefore, the parallelism of the first generator in the random number generating apparatus provided by the embodiment of the present application is good, and the generation efficiency of the random number can be significantly improved. In addition, the generation of the first random number is completed by the hardware structure of the first pipeline, and the performance of the random number generation device can meet the requirements of the hardware pipeline structure in the large-scale neural network training scenario.
可选的,第一pipeline包括多个级联的运算子电路;运算子电路包括多个并行的数据变形模块;数据变形模块用于接收第一数据、第二数据、第三数据、第一设定值以及第二设定值,将第一数据和第一设定值的积的低位作为第一输出数据;将第二数据和第三数据异或后的结果,与,第一数据和第一设定值的积的高位,异或后作为第二输出数据;数据变形模块还用于将第三数据与第二设定值之和作为第三输出数据;或者,除最后一级中数据变形模块外,每级中数据变形模块还用于将第三数据与第二设定值之和作为第三输出数据;前一级中数据变形模块的第一输出数据、第二输出数据以及第三输出数据,分别作为后一级中数据变形模块的第二数据、第一数据以及第三数据;其中,随机数种子的至少一个比特作为第一级运算子电路的第三数据;最后一级运算子电路的第一输出数据和第二输出数据作为第一pipeline生成的随机数。结构简单,易于实现。Optionally, the first pipeline includes a plurality of cascaded operation subcircuits; the operation subcircuit includes a plurality of parallel data deformation modules; the data deformation module is used to receive the first data, the second data, the third data, the first design For the fixed value and the second set value, the lower bit of the product of the first data and the first set value is used as the first output data; the result of XORing the second data and the third data, and, the first data and the first data The high bit of the product of a set value is XORed as the second output data; the data deformation module is also used to use the sum of the third data and the second set value as the third output data; or, except the data in the last stage In addition to the deformation module, the data deformation module in each stage is also used to use the sum of the third data and the second set value as the third output data; the first output data, the second output data and the first output data of the data deformation module in the previous stage The three output data are respectively used as the second data, first data and third data of the data deformation module in the latter stage; wherein, at least one bit of the random number seed is used as the third data of the first stage operation subcircuit; the last stage The first output data and the second output data of the operation sub-circuit are used as random numbers generated by the first pipeline. The structure is simple and easy to implement.
可选的,第一级运算子电路中的多个数据变形模块接收的第一数据、第二数据、第三数据、第一设定值以及第二设定值中的至少一个不同。这样一来,同一运算子电路包括的多个数据变形模块得到的第一输出数据、第二输出数据以及第三输出数据至少有一个不同,从而可提高第一pipeline生成的第一随机数的随机性。Optionally, at least one of the first data, the second data, the third data, the first set value, and the second set value received by the multiple data deformation modules in the first-stage operation subcircuit are different. In this way, at least one of the first output data, the second output data and the third output data obtained by the multiple data deformation modules included in the same operation subcircuit is different, so that the randomness of the first random number generated by the first pipeline can be improved. sex.
可选的,相邻级运算子电路中相连接的两个数据变形模块,接收的第一设定值和第二设定值中的至少一个不同。可提高第一pipeline生成的第一随机数的随机性。Optionally, at least one of the first set value and the second set value received by the two data deformation modules connected in the adjacent-stage operation subcircuit is different. The randomness of the first random number generated by the first pipeline can be improved.
可选的,同一数据变形模块接收的第一设定值和第二设定值相同。Optionally, the first setting value and the second setting value received by the same data deformation module are the same.
可选的,数据变形模块包括第一乘法器、第一异或器、第二异或器以及第一加法器;第一乘法器用于接收第一数据和第一设定值,将第一数据与第一设定值相乘,并将得到的积的低位作为第一输出数据输出;第一异或器用于接收第二数据和第三数据,将第二数据与第三数据进行异或运算;第二异或器用于接收第一异或器的输出和第一乘法器的积的高位,将第一异或器的输出与第一乘法器的积的高位进行异或运算后,作为第二输出数据输出;第一加法器用于接收第三数据和第二设定值,将第三数据与第二设定值相加,作为第三输出数据输出。Optionally, the data deformation module includes a first multiplier, a first XOR, a second XOR, and a first adder; the first multiplier is used to receive the first data and the first set value, and convert the first data Multiply with the first set value, and output the lower bit of the obtained product as the first output data; the first XOR is used to receive the second data and the third data, and perform XOR operation on the second data and the third data The second XOR is used to receive the output of the first XOR and the high position of the product of the first multiplier, after the output of the first XOR and the high position of the product of the first multiplier are carried out XOR operation, as the first XOR Two output data outputs; the first adder is used to receive the third data and the second set value, add the third data and the second set value, and output as the third output data.
可选的,至少一个生成器还包括第二生成器;第二生成器包括种子初始化发生子电路、状态旋转子电路以及输出子电路;种子初始化发生子电路用于根据随机数种子进行初始化,生成包括多个初始值的旋转链;状态旋转子电路包括多条并行的第二流 水线(pipeline),多条并行的第二pipeline用于对旋转链进行旋转;输出子电路包括至少一条输出线,至少一条输出线用于对状态旋转子电路的输出进行变形处理,生成多个随机数。第二生成器包括多条并行的第二pipeline,多条并行的第二pipeline在同一时刻输出的多个更新值,可同步对旋转链中的多个数据进行旋转,提高对旋转链的旋转更新效率,从而提升第二随机数的生成效率。在第二生成器还包括多条并行的输出线的情况下,多条并行的第二pipeline在同一时刻输出的多个更新值,可同步对旋转链中的多个数据进行旋转,提高对旋转链的旋转更新效率,多条并行的输出线在同一时刻对多个更新值进行变形并输出,可进一步提升第二随机数的生成效率。Optionally, at least one generator further includes a second generator; the second generator includes a seed initialization generation subcircuit, a state rotation subcircuit, and an output subcircuit; the seed initialization generation subcircuit is used to initialize according to the random number seed, and generate A rotation chain including a plurality of initial values; the state rotation subcircuit includes a plurality of parallel second pipelines, and the plurality of parallel second pipelines are used to rotate the rotation chain; the output subcircuit includes at least one output line, at least An output line is used to deform the output of the state rotation subcircuit to generate multiple random numbers. The second generator includes multiple parallel second pipelines, and multiple update values output by the multiple parallel second pipelines at the same time, can rotate multiple data in the rotation chain synchronously, and improve the rotation update of the rotation chain efficiency, thereby improving the generation efficiency of the second random number. In the case where the second generator further includes multiple parallel output lines, the multiple update values output by the multiple parallel second pipelines at the same time can synchronize the rotation of multiple data in the rotation chain, improving the accuracy of the rotation. The rotation update efficiency of the chain, multiple parallel output lines deform and output multiple update values at the same time, which can further improve the generation efficiency of the second random number.
可选的,种子初始化发生子电路用于接收第四数据、第三设定值、第四设定值以及第五设定值,将第四数据右移a位后和第四数据异或的结果,与,第三设定值,相乘得到的积;再与,第四设定值相加后,与,第五设定值进行与运算;将与运算的结果作为初始化值输出;其中,第四数据为随机数种子或者初始化值。Optionally, the seed initialization generating sub-circuit is used to receive the fourth data, the third setting value, the fourth setting value and the fifth setting value, and after shifting the fourth data to the right by a bit, it is XORed with the fourth data. As a result, and, the third set value, the product obtained by multiplying; and again, after the fourth set value is added, and the fifth set value to perform the AND operation; the result of the AND operation is output as the initialization value; wherein , and the fourth data is a random number seed or an initialization value.
可选的,种子初始化发生子电路包括第一右移器、第三异或器、第二乘法器、第二加法器以及第一与门器;第一右移器用于接收第四数据,将第四数据右移a位;第三异或器用于接收第一右移器的输出,将第一右移器的输出与第四数据进行异或运算;第二乘法器用于接收第三异或器的输出和第三设定值,将第三异或器的输出与第三设定值相乘;第二加法器用于接收第二乘法器的输出和第四设定值,将第三乘法器的输出与第四设定值相加;第一与门器用于接收第二加法器的输出和第五设定值,将第二加法器的输出与第五设定值进行与运算,作为初始值输出。Optionally, the seed initialization generation subcircuit includes a first right shifter, a third XOR, a second multiplier, a second adder, and a first AND gate; the first right shifter is used to receive the fourth data, The fourth data is shifted right by a bit; the third XOR is used to receive the output of the first right shifter, and the output of the first right shifter is XORed with the fourth data; the second multiplier is used to receive the third XOR the output of the third multiplier and the third set value, multiply the output of the third XOR by the third set value; the second adder is used to receive the output of the second multiplier and the fourth set value, and multiply the third The output of the adder is added with the fourth set value; the first AND gate is used to receive the output of the second adder and the fifth set value, and perform AND operation on the output of the second adder and the fifth set value, as Initial value output.
可选的,第二pipeline包括奇偶选择模块、奇数生成模块、偶数生成模块以及第二选择器;奇偶选择模块,用于接收第五数据、第六数据、第六设定值以及第七设定值,将第五数据和第六设定值进行与运算的结果,与,第六数据取模后和第七设定值进行与运算的结果,进行或运算并取模,将取模的结果输出至第二选择器和偶数生成模块;奇数生成模块,用于接收第五数据和第八设定值,将第五数据与第八设定值进行异或后输出至第二选择器;偶数生成模块,用于接收第七数据和奇偶选择模块的输出,将奇偶选择模块的输出右移b位后,与,第七数据的取模结果,进行异或后输出至第二选择器;第二选择器用于根据奇偶选择模块的输出,选择输出奇数生成模块或者偶数生成模块的输出,以对旋转链进行旋转;其中,第五数据、第六数据以及第七数据为旋转链中的不同数据。Optionally, the second pipeline includes a parity selection module, an odd number generation module, an even number generation module and a second selector; the parity selection module is used to receive the fifth data, the sixth data, the sixth setting value and the seventh setting value, the result of the AND operation of the fifth data and the sixth set value, and the result of the AND operation of the sixth data and the seventh set value after the modulo, or the result of the operation and the modulo, and the result of the modulo output to the second selector and the even number generation module; the odd number generation module is used to receive the fifth data and the eighth set value, and output the fifth data and the eighth set value to the second selector after XORing; the even number The generation module is used for receiving the output of the seventh data and the parity selection module, and after shifting the output of the parity selection module to the right by b bits, and, the modulo result of the seventh data is XORed and then output to the second selector; The second selector is used to select and output the output of the odd number generation module or the even number generation module according to the output of the parity selection module, so as to rotate the rotation chain; wherein, the fifth data, the sixth data and the seventh data are different data in the rotation chain .
可选的,奇偶选择模块包括第二与门器、第一取模器、第三与门器、第一或门器以及第二取模器;第二与门器用于接收第五数据和第六设定值,对第五数据和第六设定值进行与运算;第一取模器用于接收第六数据,对第六数据取模;第三与门器用于接收第一取模器的输出和第七设定值,对第一取模器的输出和第七设定值进行与运算;第一或门器用于接收第二与门器的输出和第三与门器的输出,对第二与门器的输出和第三与门器的输出进行或运算;第二取模器用于接收第一或门器的输出,对第一或门器的输出取模,将取模的结果输出至第二选择器和偶数生成模块。Optionally, the parity selection module includes a second AND gate, a first modulo, a third AND, a first OR, and a second modulo; the second AND gate is used to receive the fifth data and the first Six set values, perform AND operation on the fifth data and the sixth set value; the first modulo is used to receive the sixth data, and the sixth data is modulo; the third AND gate is used to receive the first modulo output and the seventh set value, perform AND operation on the output of the first modulo taker and the seventh set value; the first OR gate is used to receive the output of the second AND gate and the output of the third AND gate, and the The output of the second AND gate is ORed with the output of the third AND gate; the second modulo is used to receive the output of the first OR gate, modulo the output of the first OR gate, and take the result of the modulo Output to the second selector and even generation block.
可选的,奇数生成模块包括第四异或器;第四异或器用于接收第五数据和第八设定值,对第五数据和第八设定值进行异或运算,将异或的结果输出至第二选择器。Optionally, the odd-number generation module includes a fourth XOR; the fourth XOR is used to receive the fifth data and the eighth set value, and perform an XOR operation on the fifth data and the eighth set value, and the XOR The result is output to the second selector.
可选的,偶数生成模块包括第三取模器、第二右移器以及第五异或器;第三取模 器用于接收第七数据,对第七数据取模;第二右移器用于接收奇偶选择模块的输出,将奇偶选择模块的输出右移b位;第五异或器用于接收第三取模器的输出和第二右移器的输出,对第三取模器的输出和第二右移器的输出进行异或运算,将异或的结果输出至第二选择器。Optionally, the even number generation module includes a third modulo extractor, a second right shifter and a fifth XOR; the third modulo extractor is used to receive the seventh data, and the seventh data is modulo taken; the second right shifter is used for Receive the output of the parity selection module, and shift the output of the parity selection module to the right by b bits; the fifth XOR is used to receive the output of the third modulo extractor and the output of the second right shifter, and the output of the third modulo extractor and the output of the second right shifter are received. The output of the second right shifter is XORed, and the XORed result is output to the second selector.
可选的,输出线用于接收状态旋转子电路的输出、第九设定值以及第十设定值,将状态旋转子电路的输出右移c位后与状态旋转子电路的输出进行异或;然后将异或结果,与,异或结果左移d位后和第九设定值与运算的结果,进行异或;然后将异或结果,与异或结果左移e位后和第十设定值与运算的结果,进行异或;然后将异或结果,与,异或结果右移f位的结果,进行异或后作为随机数输出。Optionally, the output line is used to receive the output of the state rotation subcircuit, the ninth set value and the tenth set value, and to perform an exclusive OR with the output of the state rotation subcircuit after shifting the output of the state rotation subcircuit to the right by c bits. ; Then the XOR result, and, the XOR result is left shifted by d bits and the result of the ninth set value and operation, and then XOR is performed; then the XOR result, and the XOR result are left shifted by e bits and the tenth The result of the set value and operation is XORed; then the result of the XOR, and, the result of the XOR is shifted to the right by f bits, and the XOR is output as a random number.
可选的,输出线包括第三右移器、第六异或器、第一左移器、第四与门器、第七异或器、第二左移器、第五与门器、第八异或器、第四右移器以及第九异或器;第三右移器用于接收状态旋转子电路的输出,将状态旋转子电路的输出右移c位。第六异或器用于接收状态旋转子电路的输出和第三右移器的输出,将状态旋转子电路的输出与第三右移器的输出进行异或运算;第一左移器用于接收第六异或器的输出,将第六异或器的输出左移d位;第四与门器用于接收第九设定值和第一左移器的输出,将第九设定值与第一左移器的输出进行与运算;第七异或器用于接收第六异或器的输出和第四与门器的输出,将第六异或器的输出和第四与门器的输出进行异或运算;第二左移器用于接收第七异或器的输出,将第七异或器的输出左移e位;第五与门器用于接收第十设定值和第二左移器的输出,将第十设定值与第二左移器的输出进行与运算;第八异或器用于接收第五与门器的输出与第七异或器的输出,将第五与门器的输出与第七异或器的输出进行异或运算;第四右移器用于接收第八异或器的输出,将第八异或器的输出右移f位;第九异或器用于接收第八异或器的输出和第四右移器的输出,将第八异或器的输出和第四右移器的输出进行异或运算,将异或结果作为随机数输出。Optionally, the output line includes the third right shifter, the sixth XOR, the first left shifter, the fourth AND gate, the seventh XOR, the second left shifter, the fifth AND gate, the first Eight XORs, a fourth right shifter, and a ninth XOR; the third right shifter is used to receive the output of the state rotation subcircuit, and shift the output of the state rotation subcircuit to the right by c bits. The sixth XOR is used to receive the output of the state rotation sub-circuit and the output of the third right shifter, and perform XOR operation between the output of the state rotation subcircuit and the output of the third right shifter; the first left shifter is used to receive the first left shifter. The output of the six XORs shifts the output of the sixth XOR to the left by d bits; the fourth AND gate device is used to receive the ninth set value and the output of the first left shifter, and the ninth set value and the first The output of the left shifter is ANDed; the seventh XOR is used to receive the output of the sixth XOR and the output of the fourth AND gate, and XOR the output of the sixth XOR with the output of the fourth AND gate. OR operation; the second left shifter is used to receive the output of the seventh XOR, and the output of the seventh XOR is left shifted by e; the fifth AND gate device is used to receive the tenth set value and the second left shifter. Output, perform AND operation between the tenth set value and the output of the second left shifter; the eighth XOR is used to receive the output of the fifth AND gate and the output of the seventh XOR, and the fifth AND gate The output is XORed with the output of the seventh XOR; the fourth right shifter is used to receive the output of the eighth XOR and shift the output of the eighth XOR right by f bits; the ninth XOR is used to receive the output of the eighth XOR. The output of the eight XOR and the output of the fourth right shifter perform XOR operation on the output of the eighth XOR and the output of the fourth right shifter, and the XOR result is output as a random number.
可选的,第二生成器还包括第三选择器、互通寄存器以及第一存储器;第三选择器,用于接收种子初始化发生子电路输出的旋转前的旋转链和状态旋转子电路输出的旋转后的旋转链,在选择控制端的控制下,将旋转前的旋转链或旋转后的旋转链传输至互通寄存器;互通寄存器,用于接收第三选择器输出的旋转链,将旋转链传输至第一存储器,并从第一存储器中分批调取旋转链中的数据,作为第五数据、第六数据以及第七数据传输至第二pipeline;还用于将状态旋转子电路的输出传输至输出子电路;第一存储器用于接收并存储互通寄存器输出的旋转链。种子初始化发生子电路与状态旋转子电路之间的信号传输,状态旋转子电路与输出子电路之间的信号传输通过互通寄存器来实现的,通过共用同一互通寄存器,可减少互通寄存器的数量。Optionally, the second generator further includes a third selector, an interworking register, and a first memory; the third selector is configured to receive the rotation chain before the rotation output by the seed initialization generation subcircuit and the rotation output by the state rotation subcircuit. The rear rotation chain, under the control of the selection control terminal, transmits the rotation chain before rotation or the rotation chain after rotation to the intercommunication register; the intercommunication register is used to receive the rotation chain output by the third selector, and transmit the rotation chain to the third selector. a memory for retrieving the data in the rotation chain in batches from the first memory, and transmitting them to the second pipeline as the fifth data, the sixth data and the seventh data; it is also used for transmitting the output of the state rotation sub-circuit to the output a subcircuit; a first memory for receiving and storing the rotating chain output by the interworking register. The signal transmission between the seed initialization generation sub-circuit and the state rotation sub-circuit, and the signal transmission between the state rotation sub-circuit and the output sub-circuit are realized by the intercommunication register. By sharing the same intercommunication register, the number of intercommunication registers can be reduced.
可选的,随机数生成装置还包括:数据类型转换模块,用于对至少一个生成器生成的随机数进行数据类型转换。由于生成器生成的随机数为固定数据类型的随机数,仅适用于对某种特定数据类型的随机数进行训练的网络训练装置,随机数生成装置的适用范围局限性较大。通过在随机数生成装置中设置数据类型转换模块,可以对生成器生成的随机数的数据类型进行转换,以适用不同的网络训练装置,可提高随机数生成装置的适用范围。Optionally, the random number generating apparatus further includes: a data type conversion module, configured to perform data type conversion on the random numbers generated by at least one generator. Since the random number generated by the generator is a random number of a fixed data type, it is only applicable to a network training device for training random numbers of a specific data type, and the applicable scope of the random number generating device is relatively limited. By arranging a data type conversion module in the random number generating device, the data type of the random number generated by the generator can be converted so as to be applicable to different network training devices, and the applicable scope of the random number generating device can be improved.
可选的,数据类型转换模块包括至少一个数据类型转换器;数据类型转换器用于 将至少一个生成器生成的随机数转化为预设数据类型的随机数;在数据类型转换模块包括多个数据类型转换器的情况下,数据类型转换模块还包括第四选择器,第四选择器用于根据第二参数,选择对多个数据类型转换器中的一个的结果进行输出;其中,多个数据类型转换器转换得到的随机数的预设数据类型不同。Optionally, the data type conversion module includes at least one data type converter; the data type converter is used to convert the random number generated by at least one generator into a random number of a preset data type; the data type conversion module includes a plurality of data types. In the case of a converter, the data type conversion module further includes a fourth selector, and the fourth selector is used to select and output the result of one of the multiple data type converters according to the second parameter; wherein, the multiple data type conversion The preset data types of the random numbers obtained by the converter are different.
可选的,随机数生成装置还包括分布转换模块,用于对数据类型转换模块的输出进行分布类型的转换。由于生成器生成的随机数为固定分布类型的随机数,仅适用于对某种特定分布类型的随机数进行训练的网络训练装置,随机数生成装置的适用范围局限性较大。通过在随机数生成装置中设置分布转换模块,可以对生成器生成的随机数的分布类型进行转换,以适用不同的网络训练装置,可提高随机数生成装置的适用范围。Optionally, the random number generating apparatus further includes a distribution conversion module, configured to perform distribution type conversion on the output of the data type conversion module. Since the random number generated by the generator is a random number of a fixed distribution type, it is only applicable to a network training device for training random numbers of a certain distribution type, and the applicable scope of the random number generating device is relatively limited. By arranging a distribution conversion module in the random number generating device, the distribution type of the random number generated by the generator can be converted so as to be applicable to different network training devices, and the applicable scope of the random number generating device can be improved.
可选的,分布转换模块包括至少一个分布生成器;分布生成器用于将数据类型转换模块输出的随机数转化为服从预设分布的随机数;在分布转换模块包括多个分布生成器的情况下,随机数生成装置还包括第五选择器,第五选择器用于根据第三参数,选择对多个分布生成器中的一个的结果进行输出;其中,多个分布生成器转换得到的随机数服从的预设分布类型不同。Optionally, the distribution conversion module includes at least one distribution generator; the distribution generator is used to convert the random numbers output by the data type conversion module into random numbers that obey a preset distribution; when the distribution conversion module includes multiple distribution generators , the random number generating device further includes a fifth selector, and the fifth selector is used for selecting and outputting the result of one of the multiple distribution generators according to the third parameter; wherein, the random numbers converted by the multiple distribution generators obey the The preset distribution types are different.
可选的,至少一个分布生成器包括正态分布器,正态分布器采用箱式-穆勒(box-muller)算法将数据类型转换模块输出的随机数转化为服从正态分布的随机数。正态分布发生器采用硬化box-muller结构,box-muller算法是目前主流数深度学习框架使用的正态分布转化算法,采用box-muller算法形成的正态分布随机数的正态分布特性,相对irwin-hall(欧文-霍尔)算法形成的模拟正态分布随机数的正态分布特性较优。另外,以fp32数据类型为例,采用box-muller算法,输入是两个满足均匀分布的uint32随机数,输出为两个满足标准正态分布的fp32随机数。采用irwin-hall算法,输入多个均匀分布随机数相加产生一个正态分布随机数。采用irwin-hall算法对均匀分布随机数的数量要求较大,相对irwin-hall算法,在满足同等数据量输出的情况下,采用box-muller算法对生成器的性能要求较低。Optionally, the at least one distribution generator includes a normal distributor, and the normal distributor uses a box-muller algorithm to convert the random numbers output by the data type conversion module into random numbers that obey a normal distribution. The normal distribution generator adopts a hardened box-muller structure. The box-muller algorithm is the normal distribution transformation algorithm used in the current mainstream digital deep learning framework. The normal distribution characteristics of the normally distributed random numbers formed by the box-muller algorithm are relatively The normal distribution characteristics of the simulated normal distribution random numbers formed by the irwin-hall algorithm are better. In addition, taking the fp32 data type as an example, the box-muller algorithm is used, the input is two uint32 random numbers that satisfy the uniform distribution, and the output is two fp32 random numbers that satisfy the standard normal distribution. Using the irwin-hall algorithm, the input of multiple uniformly distributed random numbers is added to generate a normally distributed random number. The irwin-hall algorithm has a larger requirement on the number of uniformly distributed random numbers. Compared with the irwin-hall algorithm, the box-muller algorithm has lower performance requirements on the generator under the condition of satisfying the same amount of data output.
本申请实施例的第二方面,提供一种随机数生成方法,包括:生成器根据随机数种子同步生成多个随机数;在具有多个生成器生成多个随机数的情况下,随机数生成方法还包括第一选择器根据第一参数,选择输出多个生成器中的一个生成的多个随机数。本申请实施例提供的随机数生成方法与第一方面提供的随机数生成装置的有益效果相同,此处不再赘述。In a second aspect of the embodiments of the present application, a method for generating random numbers is provided, including: a generator synchronously generates multiple random numbers according to random number seeds; The method further includes that the first selector selects and outputs a plurality of random numbers generated by one of the plurality of generators according to the first parameter. The random number generation method provided in the embodiment of the present application has the same beneficial effects as the random number generation device provided in the first aspect, and details are not described herein again.
可选的,生成器根据随机数种子同步生成多个随机数,包括:第一生成器中多条并行的第一流水线(pipeline)根据随机数种子同步生成多个随机数。Optionally, the generator synchronously generates multiple random numbers according to the random number seed, including: multiple parallel first pipelines (pipelines) in the first generator synchronously generate multiple random numbers according to the random number seed.
可选的,生成器根据随机数种子同步生成多个随机数,包括:种子初始化发生子电路根据随机数种子进行初始化,生成包括多个初始值的旋转链;状态旋转子电路中多条并行的第二流水线(pipeline),对旋转链进行旋转;输出子电路中多条并行的输出线对状态旋转子电路的输出进行变形处理,同步生成多个随机数。Optionally, the generator generates multiple random numbers synchronously according to the random number seed, including: the seed initialization generation subcircuit performs initialization according to the random number seed, and generates a rotation chain including multiple initial values; The second pipeline (pipeline) rotates the rotation chain; a plurality of parallel output lines in the output sub-circuit perform deformation processing on the output of the state rotation sub-circuit, and generate a plurality of random numbers synchronously.
可选的,随机数生成方法还包括:数据类型转换模块对生成器生成的多个随机数进行数据类型转换。Optionally, the random number generation method further includes: the data type conversion module performs data type conversion on the multiple random numbers generated by the generator.
可选的,随机数生成方法还包括:分布转换模块对数据类型转换模块的输出进行 分布类型的转换。Optionally, the random number generation method further includes: the distribution conversion module performs distribution type conversion on the output of the data type conversion module.
本申请实施例的第三方面,提供一种神经网络系统,包括第二存储器和第一方面任一项随机数生成装置,第二存储器用于存储随机数生成装置生成的随机数。In a third aspect of the embodiments of the present application, a neural network system is provided, including a second memory and any random number generating device of the first aspect, where the second memory is used to store random numbers generated by the random number generating device.
本申请实施例提供的神经网络系统包括第一方面提供的随机数生成装置,其有益效果与随机数生成装置的有益效果相同,此处不再赘述。The neural network system provided by the embodiments of the present application includes the random number generating apparatus provided in the first aspect, and the beneficial effects thereof are the same as those of the random number generating apparatus, which will not be repeated here.
可选的,第二存储器还用于存储随机数生成装置生成的最后一个随机数的位标。这样一来,随机数种子不变,随机数生成装置执行下一次任务时,随机数生成装置可从第二存储器中读回上一次任务生成的随机数链和梅森旋转链的截止位标,保证任务的连续下发,随机数的连续生成。Optionally, the second memory is further configured to store the index of the last random number generated by the random number generating device. In this way, the random number seed remains unchanged, and when the random number generating device performs the next task, the random number generating device can read back from the second memory the random number chain generated by the previous task and the cut-off index of the Mersenne rotation chain, ensuring that Continuous distribution of tasks and continuous generation of random numbers.
本申请实施例的第四方面,提供一种芯片,包括基底和第一方面任一项随机数生成装置,随机数生成装置设置在基底上。In a fourth aspect of the embodiments of the present application, a chip is provided, including a substrate and any random number generating device of the first aspect, wherein the random number generating device is disposed on the substrate.
本申请实施例提供的芯片包括第一方面提供的随机数生成装置,其有益效果与随机数生成装置的有益效果相同,此处不再赘述。The chip provided by the embodiment of the present application includes the random number generating apparatus provided in the first aspect, and the beneficial effects thereof are the same as those of the random number generating apparatus, which will not be repeated here.
本申请实施例的第四方面,提供一种随机数生成装置,包括:第二生成器;第二生成器,包括种子初始化发生子电路、状态旋转子电路以及输出子电路;种子初始化发生子电路用于根据随机数种子进行初始化,生成包括多个初始值的旋转链;状态旋转子电路包括多条并行的第二流水线(pipeline),多条并行的第二pipeline用于对旋转链进行旋转;输出子电路包括输出线,输出线用于对状态旋转子电路的输出进行变形处理,生成随机数。A fourth aspect of the embodiments of the present application provides a random number generation device, including: a second generator; the second generator, including a seed initialization generation subcircuit, a state rotation subcircuit, and an output subcircuit; and a seed initialization generation subcircuit For initializing according to the random number seed, generating a rotation chain including a plurality of initial values; the state rotation subcircuit includes a plurality of parallel second pipelines (pipelines), and the plurality of parallel second pipelines are used to rotate the rotation chain; The output subcircuit includes an output line, and the output line is used for deforming the output of the state rotation subcircuit to generate a random number.
附图说明Description of drawings
图1为本申请实施例提供的一种神经网络系统的框架图;1 is a frame diagram of a neural network system provided by an embodiment of the present application;
图2a为本申请实施例提供的一种随机数生成装置的框架图;FIG. 2a is a frame diagram of a random number generating apparatus provided by an embodiment of the present application;
图2b为本申请实施例提供的另一种随机数生成装置的框架图;FIG. 2b is a frame diagram of another random number generating apparatus provided by an embodiment of the present application;
图2c为本申请实施例提供的又一种随机数生成装置的框架图;2c is a frame diagram of another random number generating device provided by an embodiment of the present application;
图2d为本申请实施例提供的又一种随机数生成装置的框架图;2d is a frame diagram of another random number generating apparatus provided by an embodiment of the present application;
图2e为本申请实施例提供的又一种随机数生成装置的框架图;FIG. 2e is a frame diagram of another random number generating apparatus provided by an embodiment of the present application;
图3a为本申请实施例提供的又一种随机数生成装置的框架图;3a is a frame diagram of another random number generating device provided by an embodiment of the present application;
图3b为本申请实施例提供的一种随机数生成装置使用场景图;Fig. 3b is a usage scenario diagram of a random number generating apparatus provided by an embodiment of the present application;
图4a为本申请实施例提供的一种第一流水线的框架示意图;4a is a schematic diagram of a framework of a first pipeline provided by an embodiment of the present application;
图4b为本申请实施例提供的另一种第一流水线的框架示意图;FIG. 4b is a schematic diagram of a framework of another first pipeline provided by an embodiment of the present application;
图4c为本申请实施例提供的一种第一流水线的结构示意图;FIG. 4c is a schematic structural diagram of a first pipeline provided by an embodiment of the present application;
图4d为本申请实施例提供的另一种第一流水线的结构示意图;4d is a schematic structural diagram of another first pipeline provided by an embodiment of the present application;
图5a为本申请实施例提供的一种第一流水线的使用场景图;Fig. 5a is a usage scenario diagram of a first pipeline provided by an embodiment of the present application;
图5b为本申请实施例提供的一种随机数链的示意图;5b is a schematic diagram of a random number chain provided by an embodiment of the present application;
图6a为本申请实施例提供的又一种随机数生成装置的框架图;6a is a frame diagram of another random number generating apparatus provided by an embodiment of the present application;
图6b为本申请实施例提供的另一种随机数生成装置使用场景图;FIG. 6b is a usage scenario diagram of another random number generating apparatus provided by an embodiment of the present application;
图6c为本申请实施例提供的一种box-muller算法的示意图;6c is a schematic diagram of a box-muller algorithm provided by an embodiment of the present application;
图7为本申请实施例提供的又一种随机数生成装置使用场景图;FIG. 7 is a usage scenario diagram of another random number generating apparatus provided by an embodiment of the present application;
图8a为本申请实施例提供的又一种随机数生成装置的框架图;8a is a frame diagram of another random number generating apparatus provided by an embodiment of the present application;
图8b为本申请实施例提供的又一种随机数生成装置的框架图;8b is a frame diagram of another random number generating apparatus provided by an embodiment of the present application;
图8c为本申请实施例提供的又一种随机数生成装置使用场景图;FIG. 8c is a usage scenario diagram of another random number generating apparatus provided by an embodiment of the present application;
图9a为本申请实施例提供的一种子初始化发生子电路的结构示意图;9a is a schematic structural diagram of a sub-initialization generating sub-circuit provided by an embodiment of the present application;
图9b为本申请实施例提供的一种第二流水线的结构示意图;FIG. 9b is a schematic structural diagram of a second pipeline provided by an embodiment of the present application;
图9c为本申请实施例提供的一种输出线的结构示意图;FIG. 9c is a schematic structural diagram of an output line provided by an embodiment of the present application;
图10a为本申请实施例提供的一种第二生成器的框架示意图;10a is a schematic diagram of a framework of a second generator provided by an embodiment of the present application;
图10b为本申请实施例提供的一种第二生成器的结构示意图;FIG. 10b is a schematic structural diagram of a second generator provided by an embodiment of the application;
图11为本申请实施例提供的另一种第二生成器的结构示意图;FIG. 11 is a schematic structural diagram of another second generator provided by an embodiment of the present application;
图12a为本申请实施例提供的一种旋转链的示意图;Figure 12a is a schematic diagram of a rotating chain provided by an embodiment of the application;
图12b为本申请实施例提供的一种第一存储器的内部划分示意图;FIG. 12b is a schematic diagram of the internal division of a first memory according to an embodiment of the present application;
图13为本申请实施例提供的又一种随机数生成装置的框架图;13 is a frame diagram of another random number generating apparatus provided by an embodiment of the present application;
图14a为本申请实施例提供的又一种随机数生成装置使用场景图;FIG. 14a is a usage scenario diagram of another random number generation apparatus provided by an embodiment of the present application;
图14b为本申请实施例提供的又一种随机数生成装置使用场景图。FIG. 14b is a diagram of a usage scenario of another random number generating apparatus provided by an embodiment of the present application.
具体实施方式Detailed ways
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application. Obviously, the described embodiments are only a part of the embodiments of the present application, rather than all the embodiments.
以下,术语“第一”、“第二”等仅用于描述方便,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”等的特征可以明示或者隐含地包括一个或者更多个该特征。在本申请的描述中,除非另有说明,“多个”的含义是两个或两个以上。Hereinafter, the terms "first", "second", etc. are only used for convenience of description, and should not be construed as indicating or implying relative importance or implying the number of indicated technical features. Thus, a feature defined as "first", "second", etc., may expressly or implicitly include one or more of that feature. In the description of this application, unless stated otherwise, "plurality" means two or more.
此外,本申请实施例中,“上”、“下”、“左”以及“右不限于相对附图中的部件示意置放的方位来定义的,应当理解到,这些方向性术语可以是相对的概念,它们用于相对于的描述和澄清,其可以根据附图中部件附图所放置的方位的变化而相应地发生变化。In addition, in the embodiments of the present application, "upper", "lower", "left" and "right" are not limited to be defined relative to the schematic placement of components in the drawings, and it should be understood that these directional terms may be relative concept, they are used for relative description and clarification, which may change accordingly according to the change in the orientation in which the components are placed in the drawings.
在本申请中,除非另有明确的规定和限定,术语“连接”应做广义理解,例如,In this application, unless expressly stated and defined otherwise, the term "connected" should be interpreted in a broad sense, for example,
“连接”可以是固定连接,也可以是可拆卸连接,或成一体;可以是直接相连,也可以通过中间媒介间接相连。此外,术语“电连接”可以是直接的电性连接,也可以通过中间媒介间接的电性连接。The "connection" may be a fixed connection, a detachable connection, or an integral body; it may be a direct connection or an indirect connection through an intermediate medium. In addition, the term "electrical connection" may be a direct electrical connection or an indirect electrical connection through an intermediate medium.
人工智能(artificial intelligence,AI)技术已经在社会生活和生产方面有着广泛的应用,同时也是未来技术和产品的发展趋势。各种各样的AI技术目前广泛的应用在机器视觉、图像识别、人脸识别、对象侦测、智能驾驶、语音识别、自然语言处理、机器翻译、语音生成和文本转换语音等领域。Artificial intelligence (AI) technology has been widely used in social life and production, and it is also the development trend of future technologies and products. Various AI technologies are currently widely used in machine vision, image recognition, face recognition, object detection, intelligent driving, speech recognition, natural language processing, machine translation, speech generation, and text-to-speech.
AI技术的核心是神经网络系统,如图1所示,神经网络系统通常包括随机数生成装置、第一存储器以及网络训练装置。随机数生成装置用于生成随机数。网络训练装置用于根据随机数生成装置生成的随机数进行网络训练,生成训练结果。第一存储器用于存储随机数生成装置生成的随机数和网络训练装置生成的训练结果。The core of AI technology is a neural network system. As shown in FIG. 1 , the neural network system usually includes a random number generating device, a first memory, and a network training device. The random number generating device is used to generate random numbers. The network training device is used for performing network training according to the random numbers generated by the random number generating device, and generating training results. The first memory is used for storing random numbers generated by the random number generating device and training results generated by the network training device.
为了便于网络训练装置对存储在存储器内部的随机数进行调取,存储器可以是高带宽存储器(high bandwidth memory,HBM),例如为双倍速率同步动态随机存储器(double data rate synchronous dynamic random access memory,DDRSDRAM,简称 DDR)。In order to facilitate the network training device to retrieve the random numbers stored in the memory, the memory can be a high bandwidth memory (HBM), such as a double data rate synchronous dynamic random access memory (double data rate synchronous dynamic random access memory, DDR SDRAM, referred to as DDR).
本申请实施例提供一种随机数生成装置,包括至少一个生成器,至少一个生成器中含有生成器用于根据随机数种子同步生成多个随机数。An embodiment of the present application provides an apparatus for generating random numbers, including at least one generator, and the at least one generator includes a generator for synchronously generating multiple random numbers according to a random number seed.
其中,同步生成多个随机数,可以理解为,在同一时刻有多个随机数生成,而并非依次一个一个连续多次生成多个随机数。Among them, multiple random numbers are generated synchronously, which can be understood as multiple random numbers are generated at the same time, rather than multiple random numbers being generated one by one for multiple times in succession.
此外,在本申请实施例提供的随机数生成装置包括一个生成器的情况下,该生成器用于根据随机数种子同步生成多个随机数。In addition, in the case where the random number generating apparatus provided in the embodiment of the present application includes a generator, the generator is configured to generate multiple random numbers synchronously according to the random number seed.
在本申请实施例提供的随机数生成装置包括多个生成器的情况下,多个生成器中至少有一个生成器用于根据随机数种子同步生成多个随机数。多个生成器中的其他生成器可以根据随机数种子同步生成多个随机数,也可以随机数种子依次生成多个随机数。When the random number generating apparatus provided in the embodiment of the present application includes multiple generators, at least one generator among the multiple generators is configured to generate multiple random numbers synchronously according to the random number seed. Other generators in the multiple generators can generate multiple random numbers synchronously according to the random number seed, or can generate multiple random numbers sequentially from the random number seed.
基于此,在本申请的一些实施例中,如图2a所示,随机数生成装置100包括第一生成器10,第一生成器10用于根据随机数种子同步生成多个随机数。Based on this, in some embodiments of the present application, as shown in FIG. 2a, the random number generating apparatus 100 includes a first generator 10, and the first generator 10 is configured to generate multiple random numbers synchronously according to the random number seed.
在本申请的另一些实施例中,如图2b所示,随机数生成装置100包括第一生成器10、第二生成器20以及第一选择器30。In other embodiments of the present application, as shown in FIG. 2 b , the random number generating apparatus 100 includes a first generator 10 , a second generator 20 and a first selector 30 .
第一生成器10用于根据随机数种子同步生成多个随机数,第二生成器20用于根据随机数种子依次生成多个随机数。第一选择器30用于根据第一参数,选择输出第一生成器10或第二生成器20生成的随机数。The first generator 10 is configured to generate multiple random numbers synchronously according to the random number seed, and the second generator 20 is configured to sequentially generate multiple random numbers according to the random number seed. The first selector 30 is configured to select and output the random number generated by the first generator 10 or the second generator 20 according to the first parameter.
也就是说,在随机数生成装置100包括多个生成器的情况下,在同一时刻只会选择输出其中一个生成器生成的随机数。即,在生成随机数时,在具有多个生成器生成随机数的情况下,第一选择器根据第一参数,选择输出多个生成器中的一个生成的随机数。That is, when the random number generating apparatus 100 includes multiple generators, only the random number generated by one of the generators is selected and output at the same time. That is, when generating random numbers, when there are multiple generators to generate random numbers, the first selector selects and outputs a random number generated by one of the multiple generators according to the first parameter.
基于此,在生成随机数时,若需要输出的为第一生成器10生成的随机数。示例的,第一生成器10根据随机数种子同步生成多个随机数,第二生成器20根据随机数种子依次生成多个随机数,第一选择器30根据第一参数,选择输出第一生成器10生成的随机数。Based on this, when generating a random number, if necessary, the output is the random number generated by the first generator 10 . Exemplarily, the first generator 10 synchronously generates multiple random numbers according to the random number seed, the second generator 20 sequentially generates multiple random numbers according to the random number seed, and the first selector 30 selects and outputs the first generated number according to the first parameter. The random number generated by the generator 10.
或者,在生成随机数时,若需要输出的为第二生成器20生成的随机数。示例的,第一生成器10根据随机数种子同步生成多个随机数,第二生成器20根据随机数种子依次生成多个随机数,第一选择器30根据第一参数,选择输出第二生成器20生成的随机数。Or, when generating the random number, if necessary, the output is the random number generated by the second generator 20 . Exemplarily, the first generator 10 synchronously generates multiple random numbers according to the random number seed, the second generator 20 sequentially generates multiple random numbers according to the random number seed, and the first selector 30 selects and outputs the second generated number according to the first parameter. The random number generated by the generator 20.
在本申请的另一些实施例中,如图2c所示,随机数生成装置100包括第一生成器10、第二生成器20以及第一选择器30。In other embodiments of the present application, as shown in FIG. 2 c , the random number generating apparatus 100 includes a first generator 10 , a second generator 20 and a first selector 30 .
第一生成器10用于根据随机数种子同步生成多个随机数,第二生成器20用于根据随机数种子同步生成多个随机数。第一选择器30用于根据第一参数,选择输出第一生成器10或第二生成器20生成的随机数。The first generator 10 is used for synchronously generating multiple random numbers according to the random number seed, and the second generator 20 is used for synchronously generating multiple random numbers according to the random number seed. The first selector 30 is configured to select and output the random number generated by the first generator 10 or the second generator 20 according to the first parameter.
其中,第一生成器10用于根据随机数种子同步生成多个随机数的原理和第二生成器20用于根据随机数种子同步生成多个随机数的原理可以相同。示例的,第一生成器10和第二生成器20为相同的生成器。The principle used by the first generator 10 to generate multiple random numbers synchronously according to the random number seed and the principle used by the second generator 20 to generate multiple random numbers synchronously according to the random number seed may be the same. Exemplarily, the first generator 10 and the second generator 20 are the same generator.
这样一来,在第一生成器10和第二生成器20中的一个无法正常工作的情况下, 另一个可以作为备用,以降低集成有随机数生成装置100的神经网络系统或者芯片的更换次数。In this way, if one of the first generator 10 and the second generator 20 fails to work normally, the other can be used as a backup to reduce the number of replacements of the neural network system or chip integrated with the random number generator 100 .
当然,第一生成器10用于根据随机数种子同步生成多个随机数的原理和第二生成器20用于根据随机数种子同步生成多个随机数的原理也可以不同。示例的,第一生成器10和第二生成器20为不同的生成器。Of course, the principle used by the first generator 10 to synchronously generate multiple random numbers according to the random number seed and the principle used by the second generator 20 to synchronously generate multiple random numbers according to the random number seed may also be different. Exemplarily, the first generator 10 and the second generator 20 are different generators.
这样一来,虽然接收的是同一随机数种子,但是由于第一生成器10和第二生成器20生成随机数的原理不同,二者最终生成的随机数也不同,可以满足不同训练场景下对随机数的要求,提高随机数生成装置100的适用范围。In this way, although the same random number seed is received, due to the different principles of the first generator 10 and the second generator 20 for generating random numbers, the random numbers finally generated by the two are also different, which can meet the requirements of different training scenarios. The requirements for random numbers increase the applicable range of the random number generating apparatus 100 .
可以理解的是,随机数生成装置100中可以包括多个可以同步生成多个随机数的生成器,多个生成器生成随机数的原理可以完全相同,也可以完全不同,还可以不完全相同(即部分相同,部分不同),上述仅是以随机数生成装置100包括第一生成器10和第二生成器20为例进行示意。It can be understood that the random number generating apparatus 100 may include multiple generators that can generate multiple random numbers synchronously, and the principles of generating random numbers by multiple generators may be the same, or may be completely different, and may not be exactly the same ( That is, some are the same, and some are different), and the above is only illustrated by taking the random number generating apparatus 100 including the first generator 10 and the second generator 20 as an example.
本申请实施例提供的随机数生成装置100,包括第一生成器10,第一生成器10可根据随机数种子在同一时刻同步生成多个随机数。也就是说,无需等待上一个随机数生成后,再开始生成下一个随机数。相比于相关技术中采用CPU、GPU或者NPU控制软件程序生成随机数的方式(一次生成一个随机数,生成上一个随机数后,才能生成下一个随机数),本申请实施例提供的随机数生成装置100并行度好(可同步形成多个随机数),可明显提升随机数的生成效率,提高随机数生成装置100的吞吐量。The random number generating apparatus 100 provided in the embodiment of the present application includes a first generator 10, and the first generator 10 can generate multiple random numbers synchronously at the same time according to the random number seed. That is, there is no need to wait for the previous random number to be generated before starting to generate the next random number. Compared with the method in the related art in which a CPU, GPU or NPU is used to control a software program to generate random numbers (one random number is generated at a time, and the next random number can only be generated after the previous random number is generated), the random number provided by the embodiment of the present application The generation device 100 has a good degree of parallelism (multiple random numbers can be formed synchronously), which can significantly improve the generation efficiency of random numbers and improve the throughput of the random number generation device 100 .
另外,随机数是由生成器(例如第一生成器10和第二生成器20)这些硬件结构来生成的,CPU、GPU或者NPU等提供随机数种子等参数后,无需CPU、GPU或者NPU再干预(取指令或者运算),可减轻CPU、GPU或者NPU的运行压力。In addition, the random number is generated by the hardware structure of the generator (for example, the first generator 10 and the second generator 20). After the CPU, GPU or NPU provides parameters such as random number seeds, there is no need for the CPU, GPU or NPU to Intervention (instruction fetch or operation) can reduce the operating pressure of CPU, GPU or NPU.
再者,生成器可同步生成多个随机数,同步生成多个随机数的具体数量可以调整,例如同步生成的随机数的数量可以调到比较多,生成器的扩展性好。Furthermore, the generator can synchronously generate multiple random numbers, and the specific number of synchronously generated multiple random numbers can be adjusted. For example, the number of synchronously generated random numbers can be adjusted to a larger number, and the generator has good scalability.
在本申请的一些实施例中,如图2d所示,随机数生成装置100还包括数据类型转换模块40。数据类型转换模块40用于对随机数生成装置100中生成器(例如上述第一生成器10和第二生成器20)生成的随机数进行数据类型转换。In some embodiments of the present application, as shown in FIG. 2d , the random number generating apparatus 100 further includes a data type conversion module 40 . The data type conversion module 40 is configured to perform data type conversion on the random numbers generated by the generators in the random number generating apparatus 100 (for example, the first generator 10 and the second generator 20 described above).
也就是说,随机数生成装置100生成随机数的过程中,生成器生成随机数后,数据类型转换模块40会对生成器生成的随机数进行数据类型转换后再输出。That is to say, in the process of generating the random number by the random number generating apparatus 100, after the generator generates the random number, the data type conversion module 40 converts the data type of the random number generated by the generator before outputting.
在一些应用场景中,生成器生成的随机数的数据类型可能不适用于某些网络训练装置,通过设置数据类型转换模块40,可以改变生成器生成的随机数的数据类型,以适用不同的网络训练装置,可提高随机数生成装置100的适用范围。In some application scenarios, the data type of the random number generated by the generator may not be suitable for some network training devices. By setting the data type conversion module 40, the data type of the random number generated by the generator can be changed to be suitable for different networks. The training device can improve the applicable range of the random number generating device 100 .
在本申请的一些实施例中,如图2e所示,随机数生成装置100还包括分布转换模块50。分布转换模块50用于对数据类型转换模块40输出的随机数进行分布类型的转换。In some embodiments of the present application, as shown in FIG. 2e , the random number generating apparatus 100 further includes a distribution conversion module 50 . The distribution conversion module 50 is configured to perform distribution type conversion on the random numbers output by the data type conversion module 40 .
也就是说,随机数生成装置100生成随机数的过程中,数据类型转换模块40对生成器生成的随机数进行数据类型转换后,分布转换模块50会对数据类型转换模块40输出的随机数进行分布类型的转换后再输出。That is to say, in the process of generating random numbers by the random number generating apparatus 100 , after the data type conversion module 40 performs data type conversion on the random numbers generated by the generator, the distribution conversion module 50 performs data type conversion on the random numbers output by the data type conversion module 40 . The distribution type is converted and then output.
通过在随机数生成装置100中设置分布转换模块50,可以对生成器生成的随机数的分布类型进行转换,提供丰富的随机数生成功能,以适用不同的网络训练装置。By disposing the distribution conversion module 50 in the random number generating device 100, the distribution type of the random number generated by the generator can be converted, and a rich random number generating function can be provided to be applicable to different network training devices.
基于此,本申请实施例提供的随机数生成装置100主要包括引擎(engine)和分布(distribution)两部分。其中,引擎(生成器)用于产生一个伪随机数序列。分布(数据类型转换模块40和分布转换模块50)则将这些数值映射到位于固定范围的某一数学分布中(例如正态分布、均匀分布或者位掩码分布)。功能完备,以适用神经网络训练过程中不同的随机数应用场景,可提高随机数生成装置100的适用范围。Based on this, the random number generating apparatus 100 provided by the embodiment of the present application mainly includes two parts: an engine (engine) and a distribution (distribution). Among them, the engine (generator) is used to generate a pseudo-random number sequence. The distribution (data type conversion module 40 and distribution conversion module 50) then maps these values to some mathematical distribution (eg, normal distribution, uniform distribution, or bitmask distribution) that lies within a fixed range. The functions are complete, so as to be suitable for different random number application scenarios in the neural network training process, and the applicable scope of the random number generation device 100 can be improved.
下面,以几个详细的示例对本申请实施例提供的随机数生成装置100进行说明。在对随机数生成装置100进行说明的过程中,为了便于区分,将第一生成器10生成的随机数称为第一随机数,将第二生成器20生成的随机数称为第二随机数。Hereinafter, the random number generating apparatus 100 provided by the embodiments of the present application will be described with several detailed examples. In the process of describing the random number generating apparatus 100, for the convenience of distinction, the random number generated by the first generator 10 is referred to as the first random number, and the random number generated by the second generator 20 is referred to as the second random number .
示例一Example 1
如图3a所示,随机数生成装置100包括第一生成器10、数据类型转换模块40以及分布转换模块50。As shown in FIG. 3 a , the random number generating apparatus 100 includes a first generator 10 , a data type conversion module 40 and a distribution conversion module 50 .
第一生成器10包括多条并行的第一流水线(pipeline)11;多条并行的第一pipeline11用于根据随机数种子同步生成多个第一随机数。The first generator 10 includes a plurality of parallel first pipelines 11; the plurality of parallel first pipelines 11 are used for synchronously generating a plurality of first random numbers according to random number seeds.
如图3b所示,随机数种子,例如可以通过CPU、GPU、NPU或者系统调度器通过外围总线(advanced peripheral bus,APB)传输至第一闪存,由第一闪存提供给第一生成器10。As shown in FIG. 3b, the random number seed, for example, can be transmitted to the first flash memory through a peripheral bus (advanced peripheral bus, APB) through a CPU, a GPU, an NPU or a system scheduler, and is provided by the first flash memory to the first generator 10.
其中,第一闪存例如可以是第一寄存器。The first flash memory may be, for example, a first register.
多条并行的第一pipeline11用于根据随机数种子同步生成多个第一随机数,也就是说,在第一生成器10工作过程中,每一条第一pipeline11根据随机数种子生成至少一个第一随机数。多条第一pipeline11并列设置,在同一时刻同步生成多个第一随机数。多条第一pipeline11一次生成多个第一随机数,多条第一pipeline11可循环多次,生成一条包括大量第一随机数的随机数链,多条第一pipeline11每次生成的多个第一随机数在随机数链中有各自对应的位标。The multiple parallel first pipelines 11 are used to generate multiple first random numbers synchronously according to the random number seed, that is to say, during the working process of the first generator 10, each first pipeline 11 generates at least one first random number according to the random number seed. random number. Multiple first pipelines 11 are arranged in parallel, and multiple first random numbers are generated synchronously at the same time. Multiple first pipeline11 generates multiple first random numbers at a time, and multiple first pipeline11 can be cycled multiple times to generate a random number chain including a large number of first random numbers. Multiple first pipeline11 generates multiple first random numbers each time. Random numbers have their corresponding index in the random number chain.
例如,每一条第一pipeline11一次可生成y个第一随机数,有z条并行的第一pipeline11,多条并行的第一pipeline11循环一次可生成y*z个第一随机数。多条并行的第一pipeline11循环w次,可生成一条包括y*z*w个第一随机数的随机数链。每个第一随机数在随机数链中分别有各自对应的位标。For example, each first pipeline11 can generate y first random numbers at a time, there are z parallel first pipeline11s, and multiple parallel first pipeline11 loops can generate y*z first random numbers at one time. A plurality of parallel first pipeline11 cycles w times to generate a random number chain including y*z*w first random numbers. Each first random number has a corresponding index in the random number chain.
也就是说,多条并行的第一pipeline11循环一次生成多个(例如y*z个)第一随机数,根据第一随机数的目标个数,多条并行的第一pipeline11可以循环多次(例如w次),以生成目标个数的第一随机数。That is to say, multiple parallel first pipeline11 loops generate multiple (for example, y*z) first random numbers at a time, and according to the target number of first random numbers, multiple parallel first pipeline11 can loop multiple times ( For example, w times) to generate the first random number of the target number.
关于第一pipeline11的结构,在一些可能的实施例中,如图4a所示,第一pipeline11包括多个级联的运算子电路111。Regarding the structure of the first pipeline 11 , in some possible embodiments, as shown in FIG. 4 a , the first pipeline 11 includes a plurality of cascaded operation sub-circuits 111 .
其中,一级运算子电路111表示运算子电路111中的运算迭代一次,图4a中以第一pipeline11包括10个级联的运算子电路111为例进行示意,也就是说运算子电路111的运算的迭代次数为10次。当然,图4a中10个级联的运算子电路111仅为一种示意,本申请实施例对第一pipeline11包括的运算子电路111的个数不做任何限定。示例的,第一pipeline11包括的运算子电路111的个数也可是5个、7个、8个、11个、13个等。Among them, the first-level operation sub-circuit 111 represents one iteration of the operation in the operation sub-circuit 111. In FIG. 4a, the first pipeline 11 includes 10 cascaded operation sub-circuits 111 as an example for illustration, that is to say, the operation of the operation sub-circuit 111 is illustrated. The number of iterations is 10. Of course, the 10 cascaded operation sub-circuits 111 in FIG. 4a are only an illustration, and the embodiment of the present application does not make any limitation on the number of operation sub-circuits 111 included in the first pipeline 11 . For example, the number of operation sub-circuits 111 included in the first pipeline 11 may also be 5, 7, 8, 11, 13, and so on.
关于运算子电路111的结构,如图4a所示,运算子电路111包括多个并行的数据 变形模块112(图4a以两个并行的数据变形模块112为例进行示意)。Regarding the structure of the operation sub-circuit 111, as shown in Fig. 4a, the operation sub-circuit 111 includes a plurality of parallel data transformation modules 112 (Fig. 4a takes two parallel data transformation modules 112 as an example for illustration).
如图4a和图4b所示,多个运算子电路111的级联,是通过运算子电路111中的多个数据变形模块112级联来实现的,前一级运算子电路111中的多个数据变形模块112与后一级运算子电路111中的多个数据变形模块112一一对应连接。As shown in Fig. 4a and Fig. 4b, the cascade connection of a plurality of operation sub-circuits 111 is realized by the cascade connection of a plurality of data deformation modules 112 in the operation sub-circuit 111. The data transformation module 112 is connected to the plurality of data transformation modules 112 in the subsequent stage operation sub-circuit 111 in a one-to-one correspondence.
本申请实施例对相邻两级运算子电路111中多个数据变形模块112之间的对应连接关系不作限定,图4a和图4b仅为一种示意。The embodiment of the present application does not limit the corresponding connection relationship between the multiple data deformation modules 112 in the adjacent two-stage operation sub-circuits 111, and FIG. 4a and FIG. 4b are only a schematic representation.
关于数据变形模块112,在本申请的一些实施例中,如图4c所示,数据变形模块112用于接收第一数据、第二数据、第三数据、第一设定值以及第二设定值,将第一数据和第一设定值的积的低位作为第一输出数据。将第二数据和第三数据异或后的结果,与,第一数据和第一设定值的积的高位,异或后作为第二输出数据。将第三数据与第二设定值之和作为第三输出数据。每级中数据变形模块112的功能相同。Regarding the data deformation module 112, in some embodiments of the present application, as shown in FIG. 4c, the data deformation module 112 is configured to receive the first data, the second data, the third data, the first setting value and the second setting value, and the lower order of the product of the first data and the first set value is used as the first output data. The result of the XOR of the second data and the third data, and the high-order bit of the product of the first data and the first set value, is XORed as the second output data. The sum of the third data and the second set value is used as the third output data. The function of the data warping module 112 in each stage is the same.
关于数据变形模块112用于实现上述功能的结构,在本申请的一些实施例中,如图4c所示,数据变形模块112包括第一乘法器(在本申请实施例的附图中用第一“x”表示第一乘法器)、第一异或器(在本申请实施例的附图中用第一“xor”表示第一异或器)、第二异或器(在本申请实施例的附图中用第二“xor”表示第二异或器)以及第一加法器(在本申请实施例的附图中用第一“+”表示第一加法器)。Regarding the structure of the data warping module 112 for implementing the above functions, in some embodiments of the present application, as shown in FIG. "x" represents the first multiplier), the first XOR (in the drawings of the embodiments of the present application, the first "xor" is used to represent the first XOR), the second XOR (in the embodiments of the present application) In the drawings of the present application, the second "xor" is used to represent the second XOR) and the first adder (the first "+" is used to represent the first adder in the drawings of the embodiments of the present application).
第一乘法器用于接收第一数据和第一设定值,将第一数据与第一设定值相乘,并将得到的积的低位作为第一输出数据输出。The first multiplier is used for receiving the first data and the first set value, multiplying the first data and the first set value, and outputting the lower bit of the obtained product as the first output data.
第一异或器用于接收第二数据和第三数据,将第二数据与第三数据进行异或运算。The first XOR is used for receiving the second data and the third data, and performing an XOR operation on the second data and the third data.
第二异或器用于接收第一异或器的输出和第一乘法器的积的高位,将第一异或器的输出与第一乘法器的积的高位进行异或运算后,作为第二输出数据输出。The second XOR is used to receive the high-order bits of the product of the output of the first XOR and the first multiplier. Output data output.
第一加法器用于接收第三数据和第二设定值,将第三数据与第二设定值相加,作为第三输出数据输出。The first adder is used for receiving the third data and the second setting value, adding the third data and the second setting value, and outputting the third data as the third output data.
在本申请的另一些实施例中,如图4d所示,数据变形模块112用于接收第一数据、第二数据、第三数据、第一设定值以及第二设定值,将第一数据和第一设定值的积的低位作为第一输出数据。将第二数据和第三数据异或后的结果,与,第一数据和第一设定值的积的高位,异或后作为第二输出数据。In other embodiments of the present application, as shown in FIG. 4d , the data deformation module 112 is configured to receive the first data, the second data, the third data, the first set value and the second set value, and convert the first The lower order of the product of the data and the first set value is used as the first output data. The result of the XOR of the second data and the third data, and the high-order bit of the product of the first data and the first set value, is XORed as the second output data.
除最后一级运算子电路111中的数据变形模块112外,其他级运算子电路111中的数据变形模块112还用于将第三数据与第二设定值之和作为第三输出数据。Except for the data deformation module 112 in the operation sub-circuit 111 of the last stage, the data deformation module 112 in the operation sub-circuit 111 of other stages is also used for taking the sum of the third data and the second set value as the third output data.
也就是说,最后一级运算子电路111中的数据变形模块112与其他级运算子电路111中的数据变形模块112结构不同。That is to say, the structure of the data deformation module 112 in the operation sub-circuit 111 of the last stage is different from that of the data deformation module 112 in the operation sub-circuits 111 of other stages.
关于数据变形模块112用于实现上述功能的结构,如图4d所示,除最后一级运算子电路111中的数据变形模块112外,其他级运算子电路111中的数据变形模块112与前述图4c所示的数据变形模块112相同。最后一级运算子电路111中的数据变形模块112包括第一乘法器、第一异或器以及第二异或器,不再包括第一加法器。Regarding the structure of the data deformation module 112 for realizing the above functions, as shown in FIG. 4d , except for the data deformation module 112 in the operation sub-circuit 111 of the last stage, the data deformation modules 112 in the operation sub-circuits 111 of other stages are the same as those shown in the preceding figures. The data warping module 112 shown in 4c is the same. The data transformation module 112 in the last stage operation sub-circuit 111 includes a first multiplier, a first XOR and a second XOR, and no longer includes a first adder.
以下为了便于说明,以每一级运算子电路111中数据变形模块112的结构相同为例进行示意。In the following, for the convenience of description, the data transformation module 112 in each stage of the operation sub-circuit 111 has the same structure as an example for illustration.
在本申请的一些实施例中,同一运算子电路111包括的多个数据变形模块112,接收的第一数据、第二数据、第三数据、第一设定值以及第二设定值中至少一个不同。In some embodiments of the present application, the multiple data transformation modules 112 included in the same operation sub-circuit 111 receive at least one of the first data, the second data, the third data, the first set value, and the second set value. a different.
这样一来,同一运算子电路111包括的多个数据变形模块112得到的第一输出数据、第二输出数据以及第三输出数据至少有一个不同,从而可提高第一pipeline11生成的第一随机数的随机性。In this way, the first output data, the second output data and the third output data obtained by the multiple data transformation modules 112 included in the same operation sub-circuit 111 are different in at least one, so that the first random number generated by the first pipeline 11 can be improved. randomness.
另外,同一数据变形模块112接收的第一设定值和第二设定值可以相同,也可以不同。同理,相邻级中相连接的两个数据变形模块112接收的第一设定值和第二设定值可以相同,也可以不同。In addition, the first setting value and the second setting value received by the same data transformation module 112 may be the same or different. Similarly, the first set value and the second set value received by the two data deformation modules 112 connected in adjacent stages may be the same or different.
如图4c所示,多个数据变形模块112级联后,前一级运算子电路111中数据变形模块112的第一输出数据、第二输出数据以及第三输出数据,分别作为后一级运算子电路111中数据变形模块112的第二数据、第一数据以及第三数据。As shown in FIG. 4c , after a plurality of data deformation modules 112 are cascaded, the first output data, second output data and third output data of the data deformation module 112 in the previous stage of operation sub-circuit 111 are respectively used as the latter stage of operation. The second data, the first data and the third data of the data deformation module 112 in the sub-circuit 111 .
多级运算子电路11中,最后一级运算子电路111的第一输出数据和第二输出数据作为第一pipeline11生成的第一随机数。In the multi-stage operation sub-circuit 11 , the first output data and the second output data of the last-stage operation sub-circuit 111 are used as the first random numbers generated by the first pipeline 11 .
其中,第一pipeline11是根据随机数种子同步生成多个第一随机数的。在本申请的一些实施例中,随机数种子的至少一个比特(bit)作为第一级运算子电路111的第三数据。The first pipeline 11 generates multiple first random numbers synchronously according to the random number seed. In some embodiments of the present application, at least one bit (bit) of the random number seed is used as the third data of the first-stage operation sub-circuit 111 .
也就是说,随机数种子是由包括多个bit的字符串构成,在第一级运算子电路111包括的多个数据变形模块112,接收的第三数据不同的情况下,可以将构成随机数种子的字符串拆分为多段,分别作为多个并行的数据变形模块112的第三数据。That is to say, the random number seed is composed of a string including a plurality of bits. In the case where the third data received by the plurality of data deformation modules 112 included in the first-stage operation sub-circuit 111 are different, the random number can be composed of The string of seeds is divided into multiple segments, which are respectively used as the third data of the multiple parallel data transformation modules 112 .
示例的,随机数种子为64bit的字符串,运算子电路111包括两个并行的数据变形模块112,随机数种子的高32bit作为一个数据变形模块112的第三数据,随机数种子的低32bit作为另一个数据变形模块112的第三数据。Exemplarily, the random number seed is a 64-bit string, the operator circuit 111 includes two parallel data transformation modules 112, the high 32 bits of the random number seed are used as the third data of a data transformation module 112, and the low 32 bits of the random number seed are used as the third data of the data transformation module 112. The third data of another data warping module 112 .
基于本示例提供的第一pipeline11的结构,在本申请的一些实施例中,多条并行的第一pipeline11用于基于菲洛克斯(philox)算法,根据随机数种子同步生成多个第一随机数。Based on the structure of the first pipeline 11 provided in this example, in some embodiments of the present application, multiple parallel first pipeline 11 are used to generate multiple first random numbers synchronously based on the random number seed based on the philox algorithm .
在本申请的一些实施例中,philox算法的实现可以是philox4_32_10。即,philox算法一次可以生成4个32bit的无符号整数(unit)类型的第一随机数,10表示第一pipeline11中运算子电路111的迭代次数(或者理解为级联级数)。In some embodiments of the present application, the implementation of the philox algorithm may be philox4_32_10. That is, the philox algorithm can generate 4 first random numbers of 32-bit unsigned integer (unit) type at a time, and 10 represents the number of iterations of the operator circuit 111 in the first pipeline 11 (or understood as the number of cascades).
当然,上述仅为一种示意,philox算法也可以一次生成2个第一随机数。philox算法也可以生成64位无符号整数(unit)类型的第一随机数。运算的迭代次数不限定为10次,也可是5次、7次、8次、11次等。Of course, the above is only an illustration, and the philox algorithm can also generate two first random numbers at a time. The philox algorithm can also generate a first random number of type 64-bit unsigned integer (unit). The number of iterations of the operation is not limited to 10, and may be 5, 7, 8, 11, or the like.
示例的,如图5a所示,第一pipeline11中的每一级运算子电路111包括第一数据变形模块112'和第二数据变形模块112”。Exemplarily, as shown in FIG. 5a, each stage of the operation subcircuit 111 in the first pipeline 11 includes a first data transformation module 112' and a second data transformation module 112".
第一级运算子电路111中的第一数据变形模块112'接收的第一数据counter[1]'、第二数据counter[2]'以及第三数据key',输出第一输出数据result[1]'、第二输出数据result[2]'以及第三输出数据result[3]'。The first data deformation module 112' in the first stage operation sub-circuit 111 receives the first data counter[1]', the second data counter[2]' and the third data key', and outputs the first output data result[1 ]', the second output data result[2]', and the third output data result[3]'.
第一级运算子电路111中的第二数据变形模块112”接收的第一数据counter[1]”、第二数据counter[2]”以及第三数据key”,输出第一输出数据result[1]”、第二输出数据result[2]”以及第三输出数据result[3]”。The second data deformation module 112 in the first stage operation sub-circuit 111 "receives the first data counter[1]", the second data counter[2]" and the third data key", and outputs the first output data result[1 ]", the second output data result[2]", and the third output data result[3]".
上一级运算子电路111中的第一数据变形模块112'的第一输出数据result[1]'和第二输出数据result[2]',作为下一级第二数据变形模块112”的第二数据counter[2]” 和第一数据counter[1]”。上一级运算子电路111中的第二数据变形模块112”的第一输出数据result[1]”和第二输出数据result[2]”,作为下一级第一数据变形模块112'的第二数据counter[2]'和第一数据counter[1]'。The first output data result[1]' and the second output data result[2]' of the first data deformation module 112' in the operation sub-circuit 111 of the previous stage are used as the first output data of the second data deformation module 112" of the next stage. The two data counter[2]" and the first data counter[1]". The first output data result[1]" and the second output data result[ 2]", as the second data counter[2]' and the first data counter[1]' of the next-level first data deformation module 112'.
上一级运算子电路111中的第一数据变形模块112'的第三输出数据result[3]',作为下一级第一数据变形模块112'的第三数据key'。上一级运算子电路111中的第二数据变形模块112”的第三输出数据result[3]”,作为下一级第二数据变形模块112”的第三数据key”。The third output data result[3]' of the first data transformation module 112' in the upper-level operation sub-circuit 111 is used as the third data key' of the next-level first data transformation module 112'. The third output data result[3]" of the second data transformation module 112" in the upper-level operation sub-circuit 111 is used as the third data key" of the next-level second data transformation module 112".
以下为了清楚说明,不再重复说明第一数据变形模块112'接收的第一数据counter[1]'、第二数据counter[2]'以及第三数据key'和第二数据变形模块112”接收的第一数据counter[1]”、第二数据counter[2]”以及第三数据key”,提到第一数据counter[1]'、第二数据counter[2]'以及第三数据key',即表示第一数据变形模块112'接收的数据。提到第一数据counter[1]”、第二数据counter[2]”以及第三数据key”,即表示第二数据变形模块112”接收的数据。For the sake of clarity, the first data counter[1]', the second data counter[2]' and the third data key' received by the first data deformation module 112' and received by the second data deformation module 112" will not be described again. The first data counter[1]", the second data counter[2]" and the third data key", the first data counter[1]', the second data counter[2]' and the third data key' are mentioned , that is, the data received by the first data transformation module 112 ′. Mentioning the first data counter[1]", the second data counter[2]" and the third data key" means the data received by the second data transformation module 112".
同理,不再重复说明第一数据变形模块112'的第一输出数据result[1]'、第二输出数据result[2]'以及第三输出数据result[3]'和第二数据变形模块112”的第一输出数据result[1]”、第二输出数据result[2]”以及第三输出数据result[3]”。提到第一输出数据result[1]'、第二输出数据result[2]'以及第三输出数据result[3]',即表示第一数据变形模块112'输出的数据。提到第一输出数据result[1]”、第二输出数据result[2]”以及第三输出数据result[3]”,即表示第二数据变形模块112”输出的数据。Similarly, the description of the first output data result[1]', the second output data result[2]', the third output data result[3]' and the second data deformation module of the first data deformation module 112' will not be repeated. 112" of the first output data result[1]", the second output data result[2]", and the third output data result[3]". Referring to the first output data result[1]', the second output data result[2]' and the third output data result[3]', it means the data output by the first data transformation module 112'. Referring to the first output data result[1]", the second output data result[2]" and the third output data result[3]", it means the data output by the second data transformation module 112".
如图5a所示,每条第一pipeline11的最后一级运算子电路111的第一输出数据result[1]”、第二输出数据result[2]”、第一输出数据result[1]”以及第二输出数据result[2]”作为第一pipeline11生成的第一随机数。As shown in Fig. 5a, the first output data result[1]", the second output data result[2]", the first output data result[1]" and The second output data result[2]" is used as the first random number generated by the first pipeline11.
其中,第一级运算子电路111中的第一数据变形模块112'接收的第一数据counter[1]'和第二数据counter[2]'以及第二数据变形模块112”接收的第一数据counter[1]”和第二数据counter[2]”分别为该条第一pipeline11待生成的4个第一随机数在第一生成器10待生成的随机数链中的位标(一个32bit的数据)。The first data counter[1]' and the second data counter[2]' received by the first data deformation module 112' in the first-stage operation sub-circuit 111 and the first data received by the second data deformation module 112" counter[1]" and second data counter[2]" are respectively the index of the four first random numbers to be generated by the first pipeline 11 in the random number chain to be generated by the first generator 10 (a 32-bit data).
示例的,如图5a所示,第一级运算子电路111的第一数据counter[1]'、第二数据counter[2]'、第一数据counter[1]”和第二数据counter[2]”可以由第二闪存提供。Exemplarily, as shown in FIG. 5a, the first data counter[1]', the second data counter[2]', the first data counter[1]" and the second data counter[2] of the first stage operation subcircuit 111 ]" can be provided by the second flash.
第一闪存输出本条第一pipeline11待生成的四个第一随机数在随机数链中的位标(128bit的counter_start数据),第二闪存接收第一闪存输出的128bit的counter_start数据,并输出至第一级运算子电路111。128bit的counter_start数据中对应的四个位标(32bit的数据)分别作为第一数据counter[1]'、第二数据counter[2]'、第一数据counter[1]”和第二数据counter[2]”。The first flash memory outputs the index of the four first random numbers to be generated by the first pipeline11 in the random number chain (128-bit counter_start data), and the second flash memory receives the 128-bit counter_start data output by the first flash memory and outputs it to the first flash memory. The first-level operation sub-circuit 111. The corresponding four indexes (32-bit data) in the 128-bit counter_start data are respectively used as the first data counter[1]', the second data counter[2]', and the first data counter[1] " and the second data counter[2]".
第一级运算子电路111中的第一数据变形模块112'接收的第三数据counter[1]'和第二数据变形模块112”接收的第三数据counter[3]”分别为随机数种子key(64bit数据)的高32bit和低32bit。The third data counter[1]' received by the first data deformation module 112' in the first stage operation sub-circuit 111 and the third data counter[3]" received by the second data deformation module 112" are respectively the random number seed key. (64bit data) high 32bit and low 32bit.
示例的,如图5a所示,第一级运算子电路111的第三数据key'和第三数据key”可以由第三闪存提供。Exemplarily, as shown in FIG. 5a, the third data key' and the third data key" of the first-stage operation sub-circuit 111 may be provided by the third flash memory.
第一闪存输出第一生成器10的随机数种子key,第三闪存接收第一闪存输出的随机数种子key,并输出至第一级运算子电路111。随机数种子key的高32bit和低32bit分别作为第三数据key'和第三数据key”。The first flash memory outputs the random number seed key of the first generator 10 , and the third flash memory receives the random number seed key output by the first flash memory, and outputs it to the first stage operation sub-circuit 111 . The high 32 bits and the low 32 bits of the random number seed key are used as the third data key' and the third data key" respectively.
其中,第二闪存例如可以是第二寄存器,第三闪存例如可以是第三寄存器。The second flash memory may be, for example, a second register, and the third flash memory may be, for example, a third register.
如图5b所示,一个随机数种子key对应一条随机数链。第一pipeline11生成一次第一随机数称为一个周期,同一周期内多条并行的第一pipeline11同步接收的随机数种子的bit位不同。即,同一周期内多条并行的第一pipeline11同步接收的第三数据key'和第三数据key”不同。As shown in Figure 5b, a random number seed key corresponds to a random number chain. The generation of the first random number by the first pipeline 11 once is called a cycle, and the bits of the random number seeds received synchronously by multiple parallel first pipeline 11s in the same cycle are different. That is, the third data key' and the third data key" received synchronously by a plurality of parallel first pipelines 11 in the same cycle are different.
在不同周期内同一条第一pipeline11接收的随机数种子的bit位相同,即,在不同周期内同一条第一pipeline11接收的第三数据key'或第三数据key”相同。也就是说,接收的第三数据key'的第一pipeline11,在每个周期内接收的第三数据key'始终相同。接收的第三数据key”的第一pipeline11,在每个周期内接收的第三数据key”始终相同。The bits of the random number seeds received by the same first pipeline11 in different cycles are the same, that is, the third data key' or the third data key' received by the same first pipeline11 in different cycles are the same. That is, receiving The first pipeline11 of the third data key', the third data key' received in each cycle is always the same. The first pipeline11 of the third data key "received in each cycle, the third data key" received in each cycle always the same.
一个counter_start数据对应生成四个第一随机数。在第一随机数生成的过程中,counter_start数据的输入是连续的,每条第一pipeline11的输出也是连续的。在第一pipeline11连续生成第一随机数的过程中,同一第一pipeline11在不同周期接收的counter_start数据不同。One counter_start data corresponds to generating four first random numbers. In the process of generating the first random number, the input of the counter_start data is continuous, and the output of each first pipeline11 is also continuous. During the process of continuously generating the first random number by the first pipeline 11, the counter_start data received by the same first pipeline 11 in different cycles are different.
示例的,如图5a所示,在第一个counter_start数据(counter_start[0])的运算进入第二级运算子电路111时,第一级运算子电路111开始对第二个counter_start数据(counter_start[16])进行运算。Exemplarily, as shown in FIG. 5a, when the operation of the first counter_start data (counter_start[0]) enters the second stage operation subcircuit 111, the first stage operation subcircuit 111 starts to process the second counter_start data (counter_start[0]). 16]) to perform the operation.
示例的,如图3b所示,可以由流水线控制模块(pipe_ctrl)控制第一闪存在不同时刻输出对应的counter_start数据。Illustratively, as shown in FIG. 3b, the first flash memory may be controlled by a pipeline control module (pipe_ctrl) to output corresponding counter_start data at different times.
关于运算子电路111的内部结构,如图5a所示,第一数据变形模块112'包括第一乘法器、第一异或器、第二异或器以及第一加法器。Regarding the internal structure of the operation sub-circuit 111, as shown in FIG. 5a, the first data transformation module 112' includes a first multiplier, a first XOR, a second XOR and a first adder.
第一乘法器用于接收第一数据counter[1]'和第一设定值,将第一数据counter[1]'与第一设定值相乘,并将得到的积的低32bit作为第一输出数据result[1]'输出。The first multiplier is used to receive the first data counter[1]' and the first set value, multiply the first data counter[1]' and the first set value, and use the lower 32 bits of the obtained product as the first Output data result[1]' output.
通过上述描述可知,示例的,第一级第一数据变形模块112'接收的第一数据counter[1]'为本条第一pipeline11待生成的4个第一随机数中的第一个第一随机数在随机数链中的位标(32bit的数据)。其他级第一数据变形模块112'接收的第一数据counter[1]'为前一级第二数据变形模块112”的第二输出数据result[2]”。It can be seen from the above description that, for example, the first data counter[1]' received by the first-level first data deformation module 112' is the first random number among the four first random numbers to be generated by the first pipeline 11. The index of the number in the random number chain (32bit data). The first data counter[1]' received by the first data transformation module 112' of the other stage is the second output data result[2]" of the second data transformation module 112" of the previous stage.
第一异或器用于接收第二数据counter[2]'和第三数据key',将第二数据counter[2]'与第三数据key'进行异或运算。The first XOR is used to receive the second data counter[2]' and the third data key', and perform an XOR operation on the second data counter[2]' and the third data key'.
通过上述描述可知,示例的,第一级第一数据变形模块112'接收的第二数据counter[2]'为本条第一pipeline11待生成的4个第一随机数中的第二个第一随机数在随机数链中的位标(32bit的数据)。其他级第一数据变形模块112'接收的第二数据counter[2]'为前一级第二数据变形模块112”的第一输出数据result[1]”。It can be seen from the above description that, for example, the second data counter[2]' received by the first-level first data deformation module 112' is the second first random number among the four first random numbers to be generated by the first pipeline 11. The index of the number in the random number chain (32bit data). The second data counter[2]' received by the first data transformation module 112' of the other stage is the first output data result[1]" of the second data transformation module 112" of the previous stage.
第一级第一数据变形模块112'接收的第三数据key'为随机数种子key的高32bit。其他级第一数据变形模块112'接收的第三数据key'为前一级第一数据变形模块112'的第三输出数据result[3]'。The third data key' received by the first-level first data deformation module 112' is the high 32 bits of the random number seed key. The third data key' received by the first data transformation module 112' of the other stage is the third output data result[3]' of the first data transformation module 112' of the previous stage.
第二异或器用于接收第一异或器的输出和第一乘法器的积的高32bit,将第一异或器的输出与第一乘法器的积的高32bit进行异或运算后,作为第二输出数据result[2]'输出。The second XOR is used to receive the high 32 bits of the product of the output of the first XOR and the first multiplier. The second output data result[2]' is output.
第一加法器用于接收第三数据key'和第二设定值,将第三数据key'与第二设定值相加,作为第三输出数据result[3]'输出。The first adder is configured to receive the third data key' and the second set value, add the third data key' and the second set value, and output the third data key' and the second set value as the third output data result[3]'.
第二数据变形模块112”包括第一乘法器、第一异或器、第二异或器以及第一加法器。The second data warping module 112" includes a first multiplier, a first XOR, a second XOR, and a first adder.
第一乘法器用于接收第一数据counter[1]”和第一设定值,将第一数据counter[1]”与第一设定值相乘,并将得到的积的低32bit作为第一输出数据result[1]”输出。The first multiplier is used to receive the first data counter[1]" and the first set value, multiply the first data counter[1]" with the first set value, and use the lower 32 bits of the obtained product as the first Output data result[1]" output.
通过上述描述可知,示例的,第一级第二数据变形模块112”接收的第一数据counter[1]”为本条第一pipeline11待生成的4个第一随机数中的第三个第一随机数在随机数链中的位标(32bit的数据)。其他级第二数据变形模块112”接收的第一数据counter[1]”为前一级第一数据变形模块112'的第二输出数据result[2]'。It can be seen from the above description that, for example, the first data counter[1]" received by the first-level second data deformation module 112 "is the third first random number among the four first random numbers to be generated by the first pipeline11" The index of the number in the random number chain (32bit data). The first data counter[1]' received by the second data transformation module 112'' of the other stage is the second output data result[2]' of the first data transformation module 112' of the previous stage.
第一异或器用于接收第二数据counter[2]”和第三数据key”,将第二数据counter[2]”与第三数据key”进行异或运算。The first XOR is used to receive the second data counter[2]" and the third data key", and perform an XOR operation on the second data counter[2]" and the third data key".
通过上述描述可知,示例的,第一级第二数据变形模块112”接收的第二数据counter[2]”为本条第一pipeline11待生成的4个第一随机数中的第四个第一随机数在随机数链中的位标(32bit的数据)。其他级第二数据变形模块112”接收的第二数据counter[2]”为前一级第一数据变形模块112'的第一输出数据result[1]'。It can be seen from the above description that, for example, the second data counter[2]" received by the first-level second data deformation module 112 "is the fourth first random number among the four first random numbers to be generated by the first pipeline11" The index of the number in the random number chain (32bit data). The second data counter[2]' received by the second data transformation module 112'' of the other stage is the first output data result[1]' of the first data transformation module 112' of the previous stage.
第一级第二数据变形模块112”接收的第三数据key”为随机数种子key的低32bit。其他级第二数据变形模块112”接收的第三数据key”为前一级第二数据变形模块112”的第三输出数据result[3]”。The third data key "received by the first-level second data deformation module 112" is the lower 32 bits of the random number seed key. The third data key" received by the second data transformation module 112" of the other stage is the third output data result[3]" of the second data transformation module 112" of the previous stage.
第二异或器用于接收第一异或器的输出和第一乘法器的积的高32bit,将第一异或器的输出与第一乘法器的积的高32bit进行异或运算后,作为第二输出数据result[2]”输出。The second XOR is used to receive the high 32 bits of the product of the output of the first XOR and the first multiplier. The second output data result[2]" is output.
第一加法器用于接收第三数据key”和第二设定值,将第三数据key”与第二设定值相加,作为第三输出数据result[3]”输出。The first adder is used for receiving the third data key" and the second setting value, adding the third data key" and the second setting value, and outputting the result as the third output data result[3]".
其中,第一数据变形模块112'接收的第一设定值和第二设定值相同,第二数据变形模块112”接收的第一设定值和第二设定值相同。第一数据变形模块112'接收的第一设定值和第二设定值,与,第二数据变形模块112”接收的第一设定值和第二设定值不同。The first set value and the second set value received by the first data deformation module 112 ′ are the same, and the first set value and the second set value received by the second data deformation module 112 ″ are the same. The first data deformation module The first set value and the second set value received by the module 112 ′ are different from the first set value and the second set value received by the second data transformation module 112 ″.
随机数生成装置100中的数据类型转换模块40,用于对随机数生成装置100中第一生成器10生成的第一随机数进行数据类型转换。The data type conversion module 40 in the random number generating apparatus 100 is configured to perform data type conversion on the first random number generated by the first generator 10 in the random number generating apparatus 100 .
在本申请的一些实施例中,如图6a所示,数据类型转换模块40包括一个数据类型转换器41,数据类型转换器41用于将第一生成器10生成的第一随机数转化为预设数据类型的随机数。In some embodiments of the present application, as shown in FIG. 6a, the data type conversion module 40 includes a data type converter 41, and the data type converter 41 is used to convert the first random number generated by the first generator 10 into a predetermined Let the random number of the data type.
数据类型转换器41的结构,在本申请的一些实施例中,数据类型转换器41为截断浮点16bit(bfp16)转换器。bfp16转换器用于将第一生成器10生成的第一随机数转化为bfp16类型的随机数。The structure of the data type converter 41, in some embodiments of the present application, the data type converter 41 is a truncated floating point 16-bit (bfp16) converter. The bfp16 converter is used to convert the first random number generated by the first generator 10 into a bfp16 type random number.
示例的,基于philox算法,第一生成器10生成的第一随机数的数据类型为uint32,bfp16转换器按照IEEE754标准通过对uint32保留低10bit,高位补9bit(001111111),生成一个范围在1~2的浮点数,然后减1,实现将第一随机数的数据类型由uint32转化为bfp16。For example, based on the philox algorithm, the data type of the first random number generated by the first generator 10 is uint32, and the bfp16 converter, according to the IEEE754 standard, reserves the lower 10 bits of the uint32 and fills the upper bits with 9 bits (001111111) to generate a range between 1 and 1. The floating point number of 2 is then subtracted by 1 to convert the data type of the first random number from uint32 to bfp16.
在本申请的另一些实施例中,数据类型转换器41为浮点16bit(fp16)转换器。fp16转换器用于将第一随机数转化为fp16类型的随机数。In other embodiments of the present application, the data type converter 41 is a floating point 16bit (fp16) converter. The fp16 converter is used to convert the first random number into a random number of fp16 type.
示例的,第一随机数的数据类型为uint32,fp16转换器按照IEEE754标准通过对uint32保留低7bit,高位补6bit(001111),生成一个范围在1~2的浮点数,然后减1,实现将第一随机数的数据类型由uint32转化为fp16。For example, the data type of the first random number is uint32, and the fp16 converter, according to the IEEE754 standard, generates a floating-point number in the range of 1 to 2 by reserving the lower 7 bits of uint32 and filling the upper bits with 6 bits (001111), and then subtracting 1 to realize the The data type of the first random number is converted from uint32 to fp16.
在本申请的另一些实施例中,数据类型转换器41为浮点32bit(fp32)转换器。fp32转换器用于将第一随机数数转化为fp32类型的随机数。In other embodiments of the present application, the data type converter 41 is a floating point 32bit (fp32) converter. The fp32 converter is used to convert the first random number into a random number of fp32 type.
示例的,第一随机数的数据类型为uint32,fp32转换器按照IEEE754标准通过对uint32保留低23bit,高位补9bit(001111111),生成一个范围在1~2的浮点数,然后减1,实现将第一随机数的数据类型由uint32转化为fp32。For example, the data type of the first random number is uint32, and the fp32 converter, according to the IEEE754 standard, generates a floating-point number in the range of 1 to 2 by reserving the lower 23 bits of uint32 and filling the upper bits with 9 bits (001111111), and then subtracting 1 to realize the The data type of the first random number is converted from uint32 to fp32.
在本申请的另一些实施例中,数据类型转换器41为32位无符号整数(uint32)转换器。uint32转换器用于将第一随机数转化为32位无符号整数类型的随机数。In other embodiments of the present application, the data type converter 41 is a 32-bit unsigned integer (uint32) converter. The uint32 converter is used to convert the first random number to a random number of 32-bit unsigned integer type.
可以理解的是,在第一随机数的数据类型为uint32的情况下,uint32转换器对第一随机数起到传输作用。It can be understood that, when the data type of the first random number is uint32, the uint32 converter plays a role in transmitting the first random number.
在本申请的另一些实施例中,数据类型转换器41为64位无符号整数(uint64)转换器。uint64转换器用于将第一随机数数转化为64位无符号整数类型的随机数。In other embodiments of the present application, the data type converter 41 is a 64-bit unsigned integer (uint64) converter. The uint64 converter is used to convert the first random number to a random number of 64-bit unsigned integer type.
在本申请的另一些实施例中,数据类型转换器41为32位有符号整数(int32)转换器。int32转换器用于将第一随机数转化为32位有符号整数类型的随机数。In other embodiments of the present application, the data type converter 41 is a 32-bit signed integer (int32) converter. The int32 converter is used to convert the first random number to a random number of 32-bit signed integer type.
在本申请的另一些实施例中,数据类型转换器41为64位有符号整数(int64)转换器。int64转换器用于将第一随机数转化为64位有符号整数类型的随机数。In other embodiments of the present application, the data type converter 41 is a 64-bit signed integer (int64) converter. The int64 converter is used to convert the first random number to a random number of 64-bit signed integer type.
通过上述描述可知,数据类型转换器41转换得到的随机数的数据类型与数据类型转换器41的选取有关,数据类型转换模块40包括哪种类型的数据类型转换器41,转换得到的随机数的的数据类型即为与该数据类型转换器41对应的数据类型。因此,此处的预设数据类型可以理解为与数据类型转换器41对应的数据类型。It can be seen from the above description that the data type of the random number converted by the data type converter 41 is related to the selection of the data type converter 41, which type of data type converter 41 the data type conversion module 40 includes, and the random number obtained by conversion. The data type of is the data type corresponding to the data type converter 41 . Therefore, the preset data type here can be understood as the data type corresponding to the data type converter 41 .
在本申请的另一些实施例中,如图6b所示,数据类型转换模块40包括多个数据类型转换器41和第四选择器42。In other embodiments of the present application, as shown in FIG. 6 b , the data type conversion module 40 includes a plurality of data type converters 41 and a fourth selector 42 .
数据类型转换器41用于将第一生成器10生成的第一随机数转化为预设数据类型的随机数。The data type converter 41 is used to convert the first random number generated by the first generator 10 into a random number of a preset data type.
其中,多个数据类型转换器41转换得到的随机数的预设数据类型不同。数据类型转换器41例如可以是上述bfp16转换器、fp16转换器、fp32转换器、uint32转换器、uint64转换器、int32转换器或者int64转换器。The preset data types of the random numbers converted by the plurality of data type converters 41 are different. The data type converter 41 may be, for example, the above-mentioned bfp16 converter, fp16 converter, fp32 converter, uint32 converter, uint64 converter, int32 converter, or int64 converter.
第四选择器42用于根据第二参数,选择对多个数据类型转换器41中的一个所生成的随机数。The fourth selector 42 is used for selecting a random number generated for one of the plurality of data type converters 41 according to the second parameter.
第二参数例如可以通过系统调度器、CPU、GPU或者NPU传输至第一闪存,第一闪存传输至第四选择器42。For example, the second parameter can be transmitted to the first flash memory through the system scheduler, CPU, GPU or NPU, and the first flash memory is transmitted to the fourth selector 42 .
由于第一生成器10生成的第一随机数为固定数据类型的随机数,仅适用于对某种特定数据类型的随机数进行训练的网络训练装置,随机数生成装置100的适用范围局限性较大。通过在随机数生成装置100中设置数据类型转换模块40,可以对第一生成器10生成的第一随机数的数据类型进行转换,以适用不同的网络训练装置,可提高随机数生成装置100的适用范围。Since the first random number generated by the first generator 10 is a random number of a fixed data type, it is only applicable to a network training device that trains random numbers of a certain data type, and the scope of application of the random number generating device 100 is relatively limited. big. By setting the data type conversion module 40 in the random number generating apparatus 100, the data type of the first random number generated by the first generator 10 can be converted so as to be applicable to different network training apparatuses, and the performance of the random number generating apparatus 100 can be improved. Scope of application.
随机数生成装置100中的分布转换模块50,用于对数据类型转换模块40输出的随机数进行分布类型的转换。The distribution conversion module 50 in the random number generation device 100 is configured to perform distribution type conversion on the random numbers output by the data type conversion module 40 .
在本申请的一些实施例中,如图6a所示,分布转换模块50包括一个分布生成器51,分布生成器51用于将数据类型转换模块40输出的随机数转化为服从预设分布的随机数。In some embodiments of the present application, as shown in FIG. 6a, the distribution conversion module 50 includes a distribution generator 51, and the distribution generator 51 is used to convert the random numbers output by the data type conversion module 40 into random numbers that obey a preset distribution. number.
在本申请的一些实施例中,分布生成器51为位掩码分布器(bitmask gen)。位掩码分布器用于将数据类型转换模块40输出的随机数转化为服从掩码分布的随机数。In some embodiments of the present application, the distribution generator 51 is a bitmask gen. The bit mask distributor is used to convert the random numbers output by the data type conversion module 40 into random numbers that obey the mask distribution.
以数据类型转换模块40输出的随机数服从均匀分布为例,位掩码分布器将数据类型转换模块40输出的数,与设定参数(例如软件配置的失活比例(dropout))做比较,小于失活比例则输出0,否则输出1,以达到掩码效果。Taking the random number output by the data type conversion module 40 obeying a uniform distribution as an example, the bit mask distributor compares the number output by the data type conversion module 40 with a set parameter (such as a software-configured dropout ratio), If it is less than the inactivation ratio, output 0, otherwise output 1 to achieve the mask effect.
在本申请的另一些实施例中,分布生成器51为正态分布器(normal gen),正态分布器用于将数据类型转换模块40输出的随机数转化为服从正态分布的随机数。In other embodiments of the present application, the distribution generator 51 is a normal distributor (normal gen), and the normal distributor is used to convert the random numbers output by the data type conversion module 40 into random numbers that obey a normal distribution.
正态分布器例如可以是任意均值和方差的正态分布生成器或者为截断正态分布生成器。The normal distribution generator can be, for example, a normal distribution generator of any mean and variance or a truncated normal distribution generator.
示例的,正态分布器可以采用box-muller(箱式-穆勒)算法,生成服从任意均值和方差的正态分布随机数。Exemplarily, the normal distributor may use the box-muller algorithm to generate normally distributed random numbers subject to any mean and variance.
如图6c所示,以采用box-muller算法将两个满足均匀分布的uint32类型的数据,转化为两个满足标准正态分布的fp32类型的数据为例:As shown in Figure 6c, the box-muller algorithm is used to convert two uint32 type data that satisfy the uniform distribution into two fp32 type data that satisfy the standard normal distribution as an example:
(1)首先将第一生成器10输出的uint32数据x0、x1转化为0~1的浮点数u1、v1。(1) First, convert the uint32 data x0 and x1 output by the first generator 10 into floating-point numbers u1 and v1 of 0-1.
(2)将u1与1.0e-7f作比较,如果u1<1.0e-7f,则u1=1.0e-7f;反之u1不变。(2) Compare u1 with 1.0e-7f, if u1<1.0e-7f, then u1=1.0e-7f; otherwise, u1 remains unchanged.
(3)
Figure PCTCN2021083344-appb-000001
v2=2.0*π*v1。
(3)
Figure PCTCN2021083344-appb-000001
v2=2.0*π*v1.
(4)f0_tmp=sin(v2);f1_tmp=cos(v2)。(4) f0_tmp=sin(v2); f1_tmp=cos(v2).
(5)f0=u2*f0_tmp;f1=u2*f1_tmp。(5) f0=u2*f0_tmp; f1=u2*f1_tmp.
box-muller算法是目前主流数深度学习框架使用的正态分布转化算法,正态分布器采用采用box-muller算法形成的正态分布随机数的正态分布特性,相对irwin-hall(欧文-霍尔算法)形成的模拟正态分布随机数的正态分布特性较优。The box-muller algorithm is the normal distribution transformation algorithm used in the current mainstream digital deep learning framework. The normal distribution uses the normal distribution characteristics of the normal distribution random numbers formed by the box-muller algorithm. The normal distribution characteristics of the simulated normal distribution random numbers formed by the Algorithm) are better.
另外,以fp32数据类型为例,采用box-muller算法,输入是两个满足均匀分布的uint32随机数,输出为两个满足标准正态分布的fp32随机数。采用irwin-hall算法,输入多个均匀分布随机数相加产生一个正态分布随机数。采用irwin-hall算法对均匀分布随机数的数量要求较大,相对irwin-hall算法,在满足同等数据量输出的情况下,采用box-muller算法对生成器的性能要求较低。In addition, taking the fp32 data type as an example, the box-muller algorithm is used, the input is two uint32 random numbers that satisfy the uniform distribution, and the output is two fp32 random numbers that satisfy the standard normal distribution. Using the irwin-hall algorithm, the input of multiple uniformly distributed random numbers is added to generate a normally distributed random number. The irwin-hall algorithm has a larger requirement on the number of uniformly distributed random numbers. Compared with the irwin-hall algorithm, the box-muller algorithm has lower performance requirements on the generator under the condition of satisfying the same amount of data output.
示例的,也可以将数据类型转换模块40输出的数据取绝对值后,和2比较,小于2的保留,大于等于2的丢弃,以生成服从截断正态分布的随机数。Exemplarily, the absolute value of the data output by the data type conversion module 40 can also be compared with 2, the ones less than 2 are retained, and those greater than or equal to 2 are discarded, so as to generate random numbers that obey the truncated normal distribution.
在本申请的另一些实施例中,分布生成器51为均匀分布器(uniform gen),均匀分布器用于将数据类型转换模块40输出的随机数转化为服从均匀分布的随机数。In other embodiments of the present application, the distribution generator 51 is a uniform gen, and the uniform gen is used to convert the random numbers output by the data type conversion module 40 into random numbers that obey a uniform distribution.
可以理解的是,若数据类型转换模块40的输出为服从均匀分布的uint32类型的随机数,均匀分布器相当于用于传输数据类型转换模块40输出的随机数。It can be understood that, if the output of the data type conversion module 40 is a random number of the uint32 type that is uniformly distributed, the uniform distributor is equivalent to transmitting the random number output by the data type conversion module 40 .
通过上述描述可知,分布生成器51转化得到的随机数的服从的分布类型与分布生成器51的选取有关,分布转换模块50包括哪种类型的分布生成器51,转换得到的随机数服从的分布类型即为与该分布生成器51对应的分布类型。因此,此处的预设分布可以理解为与分布生成器51对应的分布类型。It can be seen from the above description that the type of distribution obeyed by the random numbers converted by the distribution generator 51 is related to the selection of the distribution generator 51 , which type of distribution generator 51 the distribution conversion module 50 includes, and the distribution obeyed by the converted random numbers The type is the distribution type corresponding to the distribution generator 51 . Therefore, the preset distribution here can be understood as the distribution type corresponding to the distribution generator 51 .
在本申请的另一些实施例中,如图6b所示,分布转换模块50包括多个分布生成器51和第五选择器52。In other embodiments of the present application, as shown in FIG. 6 b , the distribution conversion module 50 includes a plurality of distribution generators 51 and a fifth selector 52 .
分布生成器51用于将数据类型转换模块40输出的随机数转化为服从预设分布的随机数。The distribution generator 51 is configured to convert the random numbers output by the data type conversion module 40 into random numbers obeying a preset distribution.
其中,多个分布生成器51转换得到的随机数服从的预设分布类型不同。分布生成器51例如可以是上述位掩码分布器、正态分布器或者均匀分布器。The random numbers converted by the multiple distribution generators 51 obey different preset distribution types. The distribution generator 51 may be, for example, the above-mentioned bitmask distributor, normal distributor, or uniform distributor.
第五选择器52用于根据第三参数,选择输出多个分布生成器51中的一个所生成的随机数。The fifth selector 52 is configured to select and output the random number generated by one of the plurality of distribution generators 51 according to the third parameter.
第三参数例如可以通过系统调度器、CPU、GPU或者NPU传输至第一闪存,第一闪存传输至第五选择器。For example, the third parameter may be transmitted to the first flash memory through the system scheduler, CPU, GPU or NPU, and the first flash memory may be transmitted to the fifth selector.
由于第一生成器10生成的第一随机数为固定分布类型的随机数,仅适用于对某种特定分布类型的随机数进行训练的网络训练装置,随机数生成装置100的适用范围局限性较大。通过在随机数生成装置100中设置分布转换模块50,可以对第一生成器10生成的第一随机数的分布类型进行转换,以适用不同的网络训练装置,可提高随机数生成装置100的适用范围。Since the first random number generated by the first generator 10 is a random number of a fixed distribution type, it is only suitable for a network training device that trains random numbers of a certain distribution type, and the scope of application of the random number generating device 100 is relatively limited. big. By disposing the distribution conversion module 50 in the random number generating apparatus 100, the distribution type of the first random number generated by the first generator 10 can be converted so as to be applicable to different network training apparatuses, and the applicability of the random number generating apparatus 100 can be improved. scope.
在本申请的一些实施例中,如图7所示,随机数生成装置100还包括输出控制模块60。输出控制模块60用于对第一生成器10生成的第一随机数进行缓存,并输出第一生成器10生成的第一随机数。In some embodiments of the present application, as shown in FIG. 7 , the random number generating apparatus 100 further includes an output control module 60 . The output control module 60 is configured to buffer the first random number generated by the first generator 10 and output the first random number generated by the first generator 10 .
可以理解的是,在随机数生成装置100还包括数据类型转换模块40和分布转换模块50的情况下,输出控制模块60用于对分布转换模块50生成的随机数进行缓存,并输出分布转换模块50生成的随机数。It can be understood that, when the random number generating apparatus 100 further includes the data type conversion module 40 and the distribution conversion module 50, the output control module 60 is configured to cache the random numbers generated by the distribution conversion module 50, and output the distribution conversion module. 50 Generated random numbers.
当将上述随机数生成装置100应用于上述神经网络系统中时,随机数生成装置100生成的第一随机数存储在上述神经网络系统的第二存储器中,输出控制模块60和第二存储器例如可以通过AXI(advanced extensible interface)协议进行交互,以供网络训练装置调取。第二存储器例如可以是DDRSDRAM。When the above-mentioned random number generating apparatus 100 is applied to the above-mentioned neural network system, the first random number generated by the random number generating apparatus 100 is stored in the second memory of the above-mentioned neural network system, and the output control module 60 and the second memory may be, for example, Interact through the AXI (advanced extensible interface) protocol for the network training device to call. The second memory may be, for example, DDR SDRAM.
在一些实施例中,第二存储器还用于存储随机数生成装置100生成的最后一个第一随机数的位标。In some embodiments, the second memory is further used for storing the index of the last first random number generated by the random number generating apparatus 100 .
也就是说,第二存储器用于存储随机数生成装置100生成的随机数链的截止位标。截止位标可以理解为,随机数链中最后一个第一随机数的位标。That is, the second memory is used for storing the cutoff index of the random number chain generated by the random number generating apparatus 100 . The cutoff index can be understood as the index of the last first random number in the random number chain.
示例的,给定随机数种子后,本次任务执行完毕后,第一生成器10生成了长度为100000的随机数链(包括100000个第一随机数),第100000个第一随机数的位标为 99999,99999存储在第二存储器中。在下一次任务下发后,生成的随机数链的位标从100000开始,而不会再从0开始。也就是说,虽然随机数种子相同,但本次任务接收到的第一个counter_start数据和下一次任务接收到的第一个counter_start数据不同。从而使得每次任务生成的第一随机数不同。For example, after the random number seed is given, after this task is executed, the first generator 10 generates a random number chain with a length of 100,000 (including 100,000 first random numbers), and the bits of the 100,000th first random number are generated. Labeled 99999, 99999 is stored in the second memory. After the next task is issued, the index of the generated random number chain will start from 100000 instead of 0. That is to say, although the random number seed is the same, the first counter_start data received by this task is different from the first counter_start data received by the next task. Therefore, the first random numbers generated by each task are different.
这样一来,通过将随机数生成装置100生成的最后一个第一随机数的位标存储在第二存储器中,随机数生成装置100执行下一次任务时,若随机数种子不变,随机数生成装置100可从第二存储器中读回上一次任务生成的最后一个第一随机数的位标,而不会再从0开始。保证任务的连续下发,并且每次任务生成的第一随机数不同,可实现随机数生成过程中有中间状态存在。In this way, by storing the index of the last first random number generated by the random number generating apparatus 100 in the second memory, when the random number generating apparatus 100 performs the next task, if the random number seed remains unchanged, the random number will be generated. The device 100 can read back the last index of the first random number generated by the last task from the second memory, instead of starting from 0 again. To ensure the continuous distribution of tasks, and the first random number generated by each task is different, it can be realized that there is an intermediate state in the process of random number generation.
在本申请的一些实施例中,如图7所示,随机数生成装置100还包括中断管理模块。中断管理模块用于输出正常中断请求和异常中断请求。In some embodiments of the present application, as shown in FIG. 7 , the random number generating apparatus 100 further includes an interrupt management module. The interrupt management module is used to output normal interrupt request and abnormal interrupt request.
即,中断请求包括随机数生成完成正常中断和随机数生成未完成异常中断。中断管理模块将中断请求传输至第一闪存,第一闪存通过第一传输线ioc将正常中断请求传输至神经网络系统的处理器(例如系统调度器、CPU、GPU或者NPU),第一闪存通过第二传输线ioe将异常中断请求传输至神经网络系统的处理器。处理器根据接收到的中断请求类型,输出相对应的控制指令。That is, the interrupt request includes a random number generation complete normal interrupt and a random number generation incomplete abnormal interrupt. The interrupt management module transmits the interrupt request to the first flash memory, the first flash memory transmits the normal interrupt request to the processor (such as the system scheduler, CPU, GPU or NPU) of the neural network system through the first transmission line ioc, and the first flash memory transmits the normal interrupt request through the first transmission line ioc. The second transmission line ioe transmits the abnormal interrupt request to the processor of the neural network system. The processor outputs corresponding control instructions according to the received interrupt request type.
例如,处理器接收到正常中断请求,表示当前种子对应的随机数生成完成,处理器可向随机数生成装置100输出下一次任务的相关参数。处理器接收到异常中断请求,表示当前种子对应的随机数生成未完成,处理器可向随机数生成装置100重新输出当前任务的相关参数。For example, when the processor receives a normal interrupt request, it indicates that the random number corresponding to the current seed has been generated, and the processor can output relevant parameters of the next task to the random number generating apparatus 100 . When the processor receives the abnormal interrupt request, it indicates that the generation of the random number corresponding to the current seed is not completed, and the processor can re-output the relevant parameters of the current task to the random number generating apparatus 100 .
通过上述描述可知,本申请实施例的随机数生成装置100中设置有闪存配置接口,硬化调动模块、CPU、GPU或者NPU配置第一闪存,通过第一闪存将第二参数以及第三参数分别传输至第四选择器42和第五选择器52。随机数生成装置100的软硬件接口灵活,可灵活地设定工作模式和参数,随机数生成过程中无需系统调度器、CPU、GPU或者NPU的干预,减轻系统调度器、CPU、GPU或者NPU的运行压力。It can be seen from the above description that the random number generation device 100 in the embodiment of the present application is provided with a flash memory configuration interface, the hardened transfer module, CPU, GPU or NPU configures the first flash memory, and transmits the second parameter and the third parameter respectively through the first flash memory to the fourth selector 42 and the fifth selector 52 . The software and hardware interfaces of the random number generating device 100 are flexible, and the working mode and parameters can be flexibly set. The random number generation process does not require the intervention of the system scheduler, CPU, GPU or NPU, reducing the need for the system scheduler, CPU, GPU or NPU. operating pressure.
另外,系统调度器、CPU、GPU或者NPU下发任务后,随机数生成装置100开始工作。如图7所示,系统调度器、CPU、GPU或者NPU也可以将任务下发至第一闪存,由第一闪存向第一生成器10传输使能信号,后续过程中无需系统调度器、CPU、GPU或者NPU的干预。In addition, after the system scheduler, the CPU, the GPU or the NPU issues the task, the random number generating apparatus 100 starts to work. As shown in FIG. 7 , the system scheduler, CPU, GPU or NPU can also send tasks to the first flash memory, and the first flash memory transmits the enable signal to the first generator 10 , and the system scheduler, CPU, etc. are not required in the subsequent process. , GPU or NPU intervention.
在一些实施例中,第一闪存包括参数寄存器和启动寄存器,参数寄存器用于存储参数,启动寄存器用于存储任务指令。In some embodiments, the first flash memory includes a parameter register and a start register, where the parameter register is used for storing parameters, and the start register is used for storing task instructions.
本申请实施例提供的上述随机数生成装置100,接收软件配置的随机数种子等参数,通过硬件结构生成大量的第一随机数存储在配置的第二存储器中。The above random number generating apparatus 100 provided by the embodiment of the present application receives parameters such as random number seeds configured by software, generates a large number of first random numbers through a hardware structure, and stores them in the configured second memory.
本示例中,第一生成器10包括多条并行的第一pipeline11,多条并行的第一pipeline11可同步形成多个第一随机数,且并行的第一pipeline11的数量可以根据需要扩展。另外,第一随机数生成过程中,下一个第一随机数的生成开始,无需等待上一个第一随机数的生成结束。因此,本申请实施例提供的随机数生成装置100中第一生成器10的并行度好,可明显提升随机数的生成效率。且第一随机数的生成是由第一pipeline11这个硬件结构来完成的,随机数生成装置100的性能可以满足大规模神经网 络训练场景下对硬件流水线结构的需求。In this example, the first generator 10 includes multiple parallel first pipelines 11 , the multiple parallel first pipelines 11 can simultaneously form multiple first random numbers, and the number of parallel first pipelines 11 can be expanded as required. In addition, during the generation of the first random number, the generation of the next first random number starts, and there is no need to wait for the end of the generation of the previous first random number. Therefore, the parallelism of the first generator 10 in the random number generating apparatus 100 provided by the embodiment of the present application is good, and the generation efficiency of the random number can be significantly improved. And the generation of the first random number is completed by the hardware structure of the first pipeline11, and the performance of the random number generation device 100 can meet the requirements of the hardware pipeline structure in the large-scale neural network training scenario.
另外,在第一pipeline11采用philox算法(例如philox4_32_10)生成第一随机数,可以通过TestU01测试。且本发明采用32条并行的第一pipeline11生成第一随机数时,bitmask(位掩码)生成可达100Gb/s,接近100个CPU使用ARS-2(两次高级循环加密)算法的吞吐性能,第一生成器10可达到较高的吞吐量。In addition, the philox algorithm (eg philox4_32_10) is used to generate the first random number in the first pipeline11, which can pass the TestU01 test. And when the present invention uses 32 parallel first pipelines 11 to generate the first random number, the bitmask (bit mask) can generate up to 100Gb/s, which is close to the throughput performance of 100 CPUs using the ARS-2 (two advanced loop encryption) algorithm. , the first generator 10 can achieve higher throughput.
示例二Example 2
示例二与示例一的不同之处在于,随机数生成装置100包括第二生成器20。The difference between Example 2 and Example 1 is that the random number generating apparatus 100 includes a second generator 20 .
如图8a和图8b所示,随机数生成装置100包括第二生成器20、数据类型转换模块40、分布转换模块50以及输出控制模块60。As shown in FIGS. 8 a and 8 b , the random number generating apparatus 100 includes a second generator 20 , a data type conversion module 40 , a distribution conversion module 50 and an output control module 60 .
第二生成器20包括种子初始化发生子电路21、状态旋转子电路22以及输出子电路23。The second generator 20 includes a seed initialization generation subcircuit 21 , a state rotation subcircuit 22 and an output subcircuit 23 .
种子初始化发生子电路21用于根据随机数种子进行初始化,生成包括多个初始值的旋转链。The seed initialization generation subcircuit 21 is used for initialization according to the random number seed, and generates a rotation chain including a plurality of initial values.
如图8c所示,随机数种子,例如可以通过CPU、GPU、NPU或者系统调度器通过APB传输至第一闪存,由第一闪存提供给第二生成器20。As shown in FIG. 8c , the random number seed may be transmitted to the first flash memory through the APB through the CPU, GPU, NPU or system scheduler, for example, and provided by the first flash memory to the second generator 20 .
其中,第一闪存例如可以是第一寄存器。The first flash memory may be, for example, a first register.
关于种子初始化子电路21生成多个初始值的方式,在本申请的一些实施例中,种子初始化发生子电路21用于接收第四数据、第三设定值、第四设定值以及第五设定值,将第四数据右移a位后和第四数据进行异或运算。将异或运算得到的结果与第三设定值相乘。将相乘得到的积再与第四设定值相加。将相加得到的和与第五设定值进行与运算。将与运算的结果作为初始值输出。Regarding the manner in which the seed initialization subcircuit 21 generates multiple initial values, in some embodiments of the present application, the seed initialization generation subcircuit 21 is configured to receive fourth data, a third set value, a fourth set value, and a fifth set value. Set the value, and perform the XOR operation with the fourth data after shifting the fourth data to the right by a bit. The result obtained by the exclusive OR operation is multiplied by the third set value. The multiplied product is then added to the fourth set value. The sum obtained by the addition is ANDed with the fifth set value. The result of the AND operation is output as the initial value.
重复该生成初始值的过程,得到包括多个初始值的旋转链。This process of generating initial values is repeated to obtain a rotation chain including a plurality of initial values.
其中,种子初始化子电路21是用于根据随机数种子进行初始化的。示例的,生成第一个初始值时接收的第四数据为随机数种子。在后续循环生成多个初始值的过程中,生成的初始值作为第四数据。也就是说,上一个初始值作为下一个待生成初始值的第四数据。Among them, the seed initialization subcircuit 21 is used for initialization according to the random number seed. Exemplarily, the fourth data received when the first initial value is generated is a random number seed. In the process of generating multiple initial values in subsequent loops, the generated initial values are used as the fourth data. That is, the last initial value is used as the fourth data of the next initial value to be generated.
关于种子初始化子电路21中用于实现上述功能的结构,在本申请的一些实施例中,如图9a所示,种子初始化子电路21包括第一右移器(在本申请实施例的附图中用第一“>>”表示第一右移器)、第三异或器(在本申请实施例的附图中用第三“xor”表示第三异或器)、第二乘法器(在本申请实施例的附图中用第二“x”表示第二乘法器)、第二加法器(在本申请实施例的附图中用第二“+”表示第二加法器)以及第一与门器(在本申请实施例的附图中用第一“&”表示第一与门器)。Regarding the structure of the seed initialization sub-circuit 21 for realizing the above functions, in some embodiments of the present application, as shown in FIG. In the figure, the first ">>" is used to represent the first right shifter), the third XOR (the third "xor" is used to represent the third XOR in the drawings of the embodiments of this application), the second multiplier ( In the drawings of the embodiments of the present application, the second "x" is used to represent the second multiplier), the second adder (the second "+" is used to represent the second adder in the drawings of the embodiments of the present application), and the first An AND gate (in the drawings of the embodiments of the present application, the first "&" is used to represent the first AND gate).
第一右移器用于接收第四数据,将第四数据右移a位。The first right shifter is used for receiving the fourth data, and right-shifting the fourth data by a bits.
第三异或器用于接收第一右移器的输出,将第一右移器的输出与第四数据进行异或运算。The third XOR is used for receiving the output of the first right shifter, and performing an XOR operation on the output of the first right shifter and the fourth data.
第二乘法器用于接收第三异或器的输出和第三设定值,将第三异或器的输出与第三设定值相乘。The second multiplier is used for receiving the output of the third XOR and the third set value, and multiplying the output of the third XOR with the third set value.
第二加法器用于接收第二乘法器的输出和第四设定值,将第三乘法器的输出与第 四设定值相加。The second adder is used for receiving the output of the second multiplier and the fourth set value, and adding the output of the third multiplier and the fourth set value.
第一与门器用于接收第二加法器的输出和第五设定值,将第二加法器的输出与第五设定值进行与运算。The first AND gate device is used for receiving the output of the second adder and the fifth set value, and performing AND operation on the output of the second adder and the fifth set value.
其中,第一与门的输出作为种子初始化子电路21的输出,即,第一与门的输出作为初始值输出。The output of the first AND gate is used as the output of the seed initialization sub-circuit 21, that is, the output of the first AND gate is output as the initial value.
通过上述描述可知,第四数据为随机数种子或者初始值。基于此,第一右移器不仅接收第一闪存传输的随机数种子,还接收第一与门器输出的初始值。It can be known from the above description that the fourth data is a random number seed or an initial value. Based on this, the first right shifter not only receives the random number seed transmitted by the first flash memory, but also receives the initial value output by the first AND gate.
示例的,在种子初始化发生子电路21生成第一个初始值时,第一右移器接收随机数种子。在种子初始化发生子电路21生成第二个初始值时,第一右移器接收第一与门输出的第一个初始值。依次类推,种子初始化发生子电路21生成第s个初始值时,第一右移器接收第s-1个初始值。通过调整种子初始化发生子电路21的循环次数,可以调整旋转链中初始值的个数。Exemplarily, when the seed initialization generating sub-circuit 21 generates the first initial value, the first right shifter receives the random number seed. When the seed initialization generating sub-circuit 21 generates the second initial value, the first right shifter receives the first initial value output by the first AND gate. By analogy, when the seed initialization generating sub-circuit 21 generates the s-th initial value, the first right shifter receives the s-1-th initial value. By adjusting the cycle times of the seed initialization generating subcircuit 21, the number of initial values in the rotation chain can be adjusted.
如图8a和图8b所示,状态旋转子电路22包括多条并行的第二流水线(pipeline)221,多条并行的第二pipeline221用于对旋转链进行旋转。As shown in Figures 8a and 8b, the state rotation sub-circuit 22 includes a plurality of parallel second pipelines 221 for rotating the rotation chain.
状态旋转子电路22包括多条并行的第二pipeline221,在同一时刻,多条并行的第二pipeline221同时对旋转链中的多个值进行旋转。其中,对旋转链进行旋转,可以理解为更新旋转链中的数据。The state rotation subcircuit 22 includes a plurality of parallel second pipelines 221 , and at the same time, the plurality of parallel second pipelines 221 simultaneously rotate a plurality of values in the rotation chain. Among them, rotating the rotating chain can be understood as updating the data in the rotating chain.
另外,可以理解的是,由于状态旋转子电路22会对旋转链进行旋转(即,对旋转链中的数据进行更新),这就导致旋转链中的数据是动态变化的。这样一来,状态旋转子电路22对旋转链进行旋转,既包括对初始值进行旋转,也包括对更新值进行旋转。或者理解为,状态旋转子电路22对旋转链进行旋转,包括对含有多个初始值的旋转链进行旋转,也包括对含有多个旋转后的更新值的旋转链进行旋转。In addition, it can be understood that since the state rotation sub-circuit 22 rotates the rotation chain (ie, updates the data in the rotation chain), the data in the rotation chain changes dynamically. In this way, the state rotation subcircuit 22 rotates the rotation chain, including the rotation of the initial value and the rotation of the updated value. Alternatively, it can be understood that the state rotation sub-circuit 22 rotates the rotation chain, including rotating the rotation chain containing multiple initial values, and also including rotating the rotation chain containing multiple rotated updated values.
本申请实施例中,种子初始化发生子电路21生成包括多个初始值的旋转链,称之为旋转前的旋转链。随着状态旋转子电路22对旋转链中的初始值不断更新,状态旋转子电路22生成的包括多个更新值的旋转链,称为旋转后的旋转链。多条并行的第二pipeline221可多次循环,不断循环,不断更新旋转链。In the embodiment of the present application, the seed initialization generating subcircuit 21 generates a rotation chain including a plurality of initial values, which is called a rotation chain before rotation. As the state rotation subcircuit 22 continuously updates the initial value in the rotation chain, the rotation chain generated by the state rotation subcircuit 22 including a plurality of updated values is called a rotated rotation chain. A plurality of parallel second pipelines 221 can be cycled multiple times, continuously cycle, and continuously update the rotation chain.
关于第二pipeline221对初始值进行旋转的方式,在本申请的一些实施例中,如图9b所示,第二pipeline221包括奇偶选择模块2211、奇数生成模块2212、偶数生成模块2213以及第二选择器2214。Regarding the manner in which the second pipeline 221 rotates the initial value, in some embodiments of the present application, as shown in FIG. 9b , the second pipeline 221 includes a parity selection module 2211 , an odd number generation module 2212 , an even number generation module 2213 and a second selector 2214.
奇偶选择模块2211,用于接收第五数据、第六数据、第六设定值以及第七设定值,将第五数据与第六设定值进行与运算的结果,和,第六数据取模后与第七设定值进行与运算的结果,二者进行或运算。然后对或运算的结果取模,将取模的结果输出至第二选择器2214和偶数生成模块2213。The parity selection module 2211 is used to receive the fifth data, the sixth data, the sixth set value and the seventh set value, and the result of the AND operation of the fifth data and the sixth set value, and, the sixth data is obtained. The modulo and the result of AND operation with the seventh set value, the two are OR operation. Then, modulo the result of the OR operation, and output the modulo result to the second selector 2214 and the even number generating module 2213 .
关于奇偶选择模块2211用于实现上述功能的结构,在本申请的一些实施例中,如图9b所示,奇偶选择模块2211包括第二与门器(在本申请实施例的附图中用第二“&”表示第二与门器)、第一取模器(在本申请实施例的附图中用第一“mod”表示第一取模器)、第三与门器(在本申请实施例的附图中用第三“&”表示第三与门器)、第一或门器(在本申请实施例的附图中用第一“or”表示第一或门器)以及第二取模器(在本申请实施例的附图中用第二“mod”表示第二取模器)。Regarding the structure of the parity selection module 2211 for implementing the above functions, in some embodiments of the present application, as shown in FIG. The two "&" represent the second AND gate), the first modulo (in the drawings of the embodiments of this application, the first "mod" is used to represent the first modulo), the third AND gate (in this application In the drawings of the embodiments, the third "&" is used to represent the third AND gate), the first NOR gate (the first "or" is used to represent the first OR gate in the drawings of the embodiments of the present application) and the first NOR gate. Two modulo takers (in the drawings of the embodiments of the present application, the second modulo taker is represented by the second "mod").
第二与门器用于接收第五数据和第六设定值,对第五数据和第六设定值进行与运算。The second AND gate is used to receive the fifth data and the sixth set value, and perform AND operation on the fifth data and the sixth set value.
第一取模器用于接收第六数据,对第六数据取模。The first modulo taker is used for receiving the sixth data and taking the modulo of the sixth data.
第三与门器用于接收第七设定值,对第一取模器的输出和第七设定值进行与运算。The third AND gate device is used to receive the seventh set value, and perform AND operation on the output of the first modulo extractor and the seventh set value.
第一或门器用于接收第二与门器的输出和第三与门器的输出,对第二与门器的输出和第三与门器的输出进行或运算。The first OR gate is configured to receive the output of the second AND gate and the output of the third AND gate, and perform an OR operation on the output of the second AND gate and the output of the third AND gate.
第二取模器用于接收第一或门器的输出,对第一或门器的输出取模。The second modulo is used for receiving the output of the first OR gate, and taking the modulo of the output of the first OR gate.
奇数生成模块2212,用于接收第五数据和第八设定值,将第五数据与第八设定值进行异或,将异或的结果输出至第二选择器2214。The odd number generating module 2212 is configured to receive the fifth data and the eighth set value, perform an exclusive OR on the fifth data and the eighth set value, and output the result of the exclusive OR to the second selector 2214 .
关于奇数生成模块2212用于实现上述功能的结构,在本申请的一些实施例中,如图9b所示,奇数生成模块2212包括第四异或器(在本申请实施例的附图中用第四“xor”表示第四异或器)。Regarding the structure of the odd number generation module 2212 for implementing the above functions, in some embodiments of the present application, as shown in FIG. Four "xor" means the fourth XOR).
第四异或器用于接收第五数据和第八设定值,对第五数据和第八设定值进行异或运算,将异或的结果输出至第二选择器2214。The fourth XOR is used for receiving the fifth data and the eighth setting value, performing an XOR operation on the fifth data and the eighth setting value, and outputting the XOR result to the second selector 2214 .
偶数生成模块2213,用于接收第七数据和奇偶选择模块2211的输出,将奇偶选择模块2211的输出右移b位后与第七数据的取模结果进行异或运算,将异或的结果输出至第二选择器2214。The even number generation module 2213 is used to receive the output of the seventh data and the parity selection module 2211, and after the output of the parity selection module 2211 is right-shifted by b bits, an XOR operation is performed with the modulo result of the seventh data, and the result of the XOR is output. to the second selector 2214.
关于偶数生成模块2213用于实现上述功能的结构,在本申请的一些实施例中,如图9b所示,偶数生成模块2213包括第三取模器(在本申请实施例的附图中用第三“mod”表示第三取模器)、第二右移器(在本申请实施例的附图中用第二“>>”表示第二右移器)以及第五异或器(在本申请实施例的附图中用第五“xor”表示第五异或器)。Regarding the structure used by the even number generation module 2213 to realize the above functions, in some embodiments of the present application, as shown in FIG. The three "mod" represent the third modulo extractor), the second right shifter (in the drawings of the embodiments of the present application, the second ">>" represents the second right shifter) and the fifth XOR (in this In the drawings of the application embodiment, the fifth "xor" is used to represent the fifth XOR).
第三取模器用于接收第七数据,对第七数据取模。The third modulo is used for receiving the seventh data, and modulo the seventh data.
第二右移器用于接收奇偶选择模块2211的输出,将奇偶选择模块2211的输出右移b位。The second right shifter is used to receive the output of the parity selection module 2211 and right-shift the output of the parity selection module 2211 by b bits.
第五异或器用于接收第三取模器的输出和第二右移器的输出,对第三取模器的输出和第二右移器的输出进行异或运算,将异或的结果输出至第二选择器2214。The fifth XOR is used to receive the output of the third modulo extractor and the output of the second right shifter, perform an XOR operation on the output of the third modulo extractor and the output of the second right shifter, and output the result of the XOR to the second selector 2214.
第二选择器2214用于根据奇偶选择模块2211的输出,选择输出奇数生成模块2212或者偶数生成模块2213的输出,以对旋转链进行旋转。The second selector 2214 is configured to select and output the output of the odd number generation module 2212 or the even number generation module 2213 according to the output of the parity selection module 2211, so as to rotate the rotation chain.
其中,奇偶选择模块2211中的第二取模器的输出为奇数或者偶数,第二选择器2214用于根据第二取模器输出的结果的奇偶性,选择输出奇数生成模块2212或者偶数生成模块2213的输出,第二选择器2214的输出为旋转链中的更新值。Wherein, the output of the second modulo extractor in the parity selection module 2211 is an odd or even number, and the second selector 2214 is used to select and output the odd number generating module 2212 or the even number generating module according to the parity of the result output by the second modulo extractor The output of 2213, the output of the second selector 2214 is the updated value in the rotation chain.
第二pipeline221接收的是旋转链中的数据,在本申请的一些实施例中,第五数据、第六数据以及第七数据例如可以为旋转链中的不同数据。The second pipeline 221 receives data in the rotating chain. In some embodiments of the present application, the fifth data, the sixth data, and the seventh data may be, for example, different data in the rotating chain.
关于输出子电路23,在本申请的一些实施例中,如图8a所示,输出子电路23包括一条输出线231,输出线231用于对状态旋转子电路22输出的多个更新值循环的进行变形后输出,最终实现输出子电路23对状态旋转子电路22的输出进行变形并输出,生成多个第二随机数。Regarding the output sub-circuit 23, in some embodiments of the present application, as shown in FIG. After deforming and outputting, the output sub-circuit 23 finally deforms and outputs the output of the state rotation sub-circuit 22 to generate a plurality of second random numbers.
本示例提供的第二生成器20包括多条并行的第二pipeline221,多条并行的第二 pipeline221在同一时刻输出的多个更新值,可同步对旋转链中的多个数据进行旋转,提高对旋转链的旋转更新效率,从而提升第二随机数的生成效率。The second generator 20 provided in this example includes multiple parallel second pipelines 221 , and multiple update values output by the multiple parallel second pipelines 221 at the same time can rotate multiple data in the rotation chain synchronously, improving the accuracy of The rotation update efficiency of the rotation chain improves the generation efficiency of the second random number.
在本申请的另一些实施例中,如图8b所示,输出子电路23包括多条并行的输出线231,多条并行的输出线231用于对状态旋转子电路22输出的多个更新值同步进行变形后输出,最终实现输出子电路23对状态旋转子电路22的输出进行变形并输出,同步生成多个第二随机数。In other embodiments of the present application, as shown in FIG. 8 b , the output sub-circuit 23 includes a plurality of parallel output lines 231 , and the plurality of parallel output lines 231 are used for updating a plurality of values output by the state rotation sub-circuit 22 After synchronously deforming and outputting, the output sub-circuit 23 deforms and outputs the output of the state rotation sub-circuit 22, and generates a plurality of second random numbers synchronously.
本示例提供的第二生成器20包括多条并行的第二pipeline221的基础上,还包括多条并行的输出线231,多条并行的第二pipeline221在同一时刻输出的多个更新值,可同步对旋转链中的多个数据进行旋转,提高对旋转链的旋转更新效率,多条并行的输出线231在同一时刻对多个更新值进行变形并输出,可进一步提升第二随机数的生成效率。The second generator 20 provided in this example includes multiple parallel second pipelines 221 and multiple parallel output lines 231. Multiple update values output by multiple parallel second pipelines 221 at the same time can be synchronized Rotate multiple data in the rotation chain to improve the rotation update efficiency of the rotation chain. Multiple parallel output lines 231 deform and output multiple update values at the same time, which can further improve the generation efficiency of the second random number .
在一些实施例中,输出子电路23包括的输出线231的条数与状态旋转子电路22包括的第二pipeline221的条数相同。In some embodiments, the number of output lines 231 included in the output sub-circuit 23 is the same as the number of the second pipelines 221 included in the state rotation sub-circuit 22 .
无论输出子电路23包括几条输出线231,输出线231对更新值进行变形的方式相同。Regardless of how many output lines 231 the output sub-circuit 23 includes, the output lines 231 deform the update value in the same way.
在本申请的一些实施例中,输出线231用于接收状态旋转子电路22的输出、第九设定值以及第十设定值,将状态旋转子电路22的输出右移c位后与状态旋转子电路22的输出进行异或运算。然后将异或结果,和,异或结果左移d位后和第九设定值进行与运算的结果,二者进行异或运算。然后将异或结果,和,异或结果左移e位后和第十设定值进行与运算的结果,二者进行异或运算。然后将异或结果,和,异或结果右移f位的结果,进行异或运算,将异或结果作为第二随机数输出。In some embodiments of the present application, the output line 231 is used to receive the output of the state rotation sub-circuit 22, the ninth set value and the tenth set value, and to shift the output of the state rotation sub-circuit 22 to the right by c bits and the state The output of the rotary subcircuit 22 is XORed. Then, the XOR result, the sum, and the XOR result are left-shifted by d bits and the result of performing the AND operation with the ninth set value, and the XOR operation is performed on the two. Then, the XOR result, the sum, and the XOR result are left-shifted by e and the result of the AND operation with the tenth set value, and the XOR operation is performed on the two. Then, the XOR result, the sum, and the XOR result are shifted by f bits to the right, and the XOR operation is performed, and the XOR result is output as a second random number.
关于输出线231用于实现上述功能的结构,在本申请的一些实施例中,如图9c所示,输出线231包括包括第三右移器(在本申请实施例的附图中用第三“>>”表示第三右移器)、第六异或器(在本申请实施例的附图中用第六“xor”表示第六异或器)、第一左移器(在本申请实施例的附图中用第一“<<”表示第一左移器)、第四与门器(在本申请实施例的附图中用第四“&”表示第四与门器)、第七异或器(在本申请实施例的附图中用第七“xor”表示第七异或器)、第二左移器(在本申请实施例的附图中用第二“<<”表示第二左移器)、第五与门器(在本申请实施例的附图中用第五“&”表示第五与门器)、第八异或器(在本申请实施例的附图中用第八“xor”表示第八异或器)、第四右移器(在本申请实施例的附图中用第四“>>”表示第四右移器)以及第九异或器(在本申请实施例的附图中用第九“xor”表示第九异或器)。Regarding the structure of the output line 231 for realizing the above-mentioned functions, in some embodiments of the present application, as shown in FIG. ">>" represents the third right shifter), the sixth XOR (in the drawings of the embodiments of the present application, the sixth "xor" is used to indicate the sixth XOR), the first left shifter (in the present application In the drawings of the embodiments, the first "<<" is used to represent the first left shifter), the fourth AND gate (the fourth "&" is used to represent the fourth AND gate in the drawings of the embodiments of the present application), The seventh XOR (in the drawings of the embodiments of the present application, the seventh XOR is represented by the seventh "xor"), the second left shifter (in the drawings of the embodiments of the present application, the second "<< "represents the second left shifter), the fifth AND gate (in the drawings of the embodiments of this application, the fifth "&" is used to represent the fifth AND gate), the eighth XOR (in the In the drawings, the eighth "xor" is used to indicate the eighth XOR), the fourth right shifter (the fourth ">>" is used to indicate the fourth right shifter in the drawings of the embodiments of the present application) and the ninth XOR OR (a ninth XOR is represented by a ninth "xor" in the drawings of the embodiments of the present application).
第三右移器用于接收状态旋转子电路的输出,将状态旋转子电路的输出右移c位。The third right shifter is used for receiving the output of the state rotation sub-circuit, and right-shifting the output of the state rotation sub-circuit by c bits.
第六异或器用于接收状态旋转子电路的输出和第三右移器的输出,将状态旋转子电路的输出与第三右移器的输出进行异或运算。The sixth XOR is used to receive the output of the state rotation subcircuit and the output of the third right shifter, and to perform an XOR operation on the output of the state rotation subcircuit and the output of the third right shifter.
第一左移器用于接收第六异或器的输出,将第六异或器的输出左移d位。The first left shifter is used for receiving the output of the sixth XOR and shifting the output of the sixth XOR to the left by d bits.
第四与门器用于接收第九设定值和第一左移器的输出,将第九设定值与第一左移器的输出进行与运算。The fourth AND gate is used to receive the ninth set value and the output of the first left shifter, and perform AND operation on the ninth set value and the output of the first left shifter.
第七异或器用于接收第六异或器的输出和第四与门器的输出,将第六异或器的输出和第四与门器的输出进行异或运算。The seventh XOR is used for receiving the output of the sixth XOR and the output of the fourth AND gate, and performing an XOR operation on the output of the sixth XOR and the output of the fourth AND gate.
第二左移器用于接收第七异或器的输出,将第七异或器的输出左移e位。The second left shifter is used for receiving the output of the seventh XOR and shifting the output of the seventh XOR to the left by e bits.
第五与门器用于接收第十设定值和第二左移器的输出,将第十设定值与第二左移器的输出进行与运算。The fifth AND gate is used to receive the tenth set value and the output of the second left shifter, and perform AND operation on the tenth set value and the output of the second left shifter.
第八异或器用于接收第五与门器的输出与第七异或器的输出,将第五与门器的输出与第七异或器的输出进行异或运算。The eighth XOR is used for receiving the output of the fifth AND gate and the output of the seventh XOR, and performing XOR operation on the output of the fifth AND gate and the output of the seventh XOR.
第四右移器用于接收第八异或器的输出,将第八异或器的输出右移f位。The fourth right shifter is used for receiving the output of the eighth XOR, and right-shifting the output of the eighth XOR by f bits.
第九异或器用于接收第八异或器的输出和第四右移器的输出,将第八异或器的输出和第四右移器的输出进行异或运算,将异或结果作为第二随机数输出。The ninth XOR is used to receive the output of the eighth XOR and the output of the fourth right shifter, and perform XOR operation on the output of the eighth XOR and the output of the fourth right shifter, and use the XOR result as the first XOR. Two random number outputs.
关于实现种子初始化发生子电路21和状态旋转子电路22之间、状态旋转子电路22和输出子电路23之间信号连通的方式,如图10a和图10b所示,第二生成器20还包括第三选择器24、互通寄存器25以及第一存储器26。Regarding the way to realize the signal communication between the seed initialization generation sub-circuit 21 and the state rotation sub-circuit 22, and between the state rotation sub-circuit 22 and the output sub-circuit 23, as shown in FIG. 10a and FIG. 10b, the second generator 20 further includes The third selector 24 , the interworking register 25 and the first memory 26 .
种子初始化发生子电路21输出的旋转前的旋转链和状态旋转子电路22输出的旋转后的旋转链分别传输至第三选择器24。The pre-rotation rotation chain output by the seed initialization generating sub-circuit 21 and the rotated rotation chain output by the state rotation sub-circuit 22 are respectively transmitted to the third selector 24 .
第三选择器24用于接收种子初始化发生子电路21输出的旋转前的旋转链和状态旋转子电路22输出的旋转后的旋转链,在选择控制端control的控制下,将旋转前的旋转链或旋转后的旋转链传输至互通寄存器25。The third selector 24 is used to receive the rotation chain before rotation output by the seed initialization generating sub-circuit 21 and the rotation chain after rotation output by the state rotation sub-circuit 22. Under the control of the selection control terminal control, the rotation chain before rotation Or the rotated chain is transferred to the interworking register 25 .
互通寄存器25,用于接收第三选择器24输出的旋转链。在选择控制端control输出的控制信号为控制第三选择器24输出种子初始化发生子电路21输出的初始值时,互通寄存器25接收旋转前的旋转链。在选择控制端control输出的控制信号为控制第三选择器24输出状态旋转子电路22输出的更新值时,互通寄存器25接收旋转后的旋转链。The interworking register 25 is used to receive the rotation chain output by the third selector 24 . When the control signal output by the selection control terminal control is to control the third selector 24 to output the initial value output by the seed initialization generating sub-circuit 21, the intercommunication register 25 receives the rotation chain before rotation. When the control signal output by the selection control terminal control is to control the third selector 24 to output the updated value output by the state rotation sub-circuit 22, the intercommunication register 25 receives the rotated rotation chain.
种子初始化发生子电路21生成初始值的过程中,循环一次生成一个初始值,最终生成包括多个初始值的旋转链。在循环的过程中,互通寄存器25分时接收到初始值,需要对接收到的初始值进行存储。In the process of generating the initial value by the seed initialization generating sub-circuit 21, one initial value is generated in a cycle, and finally a rotation chain including a plurality of initial values is generated. During the cycle, the intercommunication register receives the initial value every 25 minutes, and needs to store the received initial value.
同理,状态旋转子电路22生成更新值的过程中,循环一次生成多个更新值,在循环过程中,互通寄存器25分时接收到更新值,需要对接收到的更新值进行存储。Similarly, in the process of generating the update value by the state rotation sub-circuit 22, multiple update values are generated in one cycle. During the cycle, the intercommunication register receives the update value every 25 times, and needs to store the received update value.
基于此,互通寄存器25用于将接收到的旋转链传输至第一存储器26,第一存储器26接收并存储互通寄存器25输出的旋转链。Based on this, the intercommunication register 25 is used to transmit the received rotary chain to the first memory 26 , and the first memory 26 receives and stores the rotary chain output by the intercommunication register 25 .
此处,第一存储器26例如可以是静态随机存取存储器(static random-access memory,SRAM)。Here, the first memory 26 may be, for example, a static random-access memory (static random-access memory, SRAM).
通过上述描述可知,在生成第二随机数的过程中,第三选择器24在选择控制端control输出的控制信号的控制下,先将种子初始化发生子电路21生成的包括多个初始值的旋转链传输至互通寄存器25,存储在第一存储器26中。然后状态旋转子电路22的输出传输至第三选择器24,第三选择器24在选择控制端control输出的控制信号的控制下,将状态旋转子电路22输出的更新值传输至互通寄存器25。互通寄存器25将状态旋转子电路22输出的更新值传输至第一存储器26,对第一存储器26中的旋转链进行更新。不断循环,不断更新旋转链。It can be seen from the above description that in the process of generating the second random number, the third selector 24, under the control of the control signal output by the selection control terminal control, firstly initializes the rotation including multiple initial values generated by the seed initialization generating sub-circuit 21 The chain is transferred to the interworking register 25 where it is stored in the first memory 26 . Then, the output of the state rotation subcircuit 22 is transmitted to the third selector 24 , and the third selector 24 transmits the updated value output by the state rotation subcircuit 22 to the interconnection register 25 under the control of the control signal output by the selection control terminal control. The intercommunication register 25 transmits the updated value output by the state rotation sub-circuit 22 to the first memory 26 to update the rotation chain in the first memory 26 . Continuous cycle, constantly updating the rotating chain.
在本申请的一些实施例中,旋转链存储在第一存储器26中,状态旋转子电路22对旋转链进行旋转时,互通寄存器25向状态旋转子电路22中每条第二pipeline221传 输旋转链中的数据,作为第五数据、第六数据以及第七数据。In some embodiments of the present application, the rotation chain is stored in the first memory 26 , and when the state rotation subcircuit 22 rotates the rotation chain, the intercommunication register 25 transmits the rotation chain to each second pipeline 221 in the state rotation subcircuit 22 . , as the fifth data, sixth data and seventh data.
基于此,互通寄存器25还用于从第一存储器26中分批调取旋转链中的数据,传输至第二pipeline221。Based on this, the intercommunication register 25 is also used for retrieving the data in the rotation chain from the first memory 26 in batches and transferring it to the second pipeline 221 .
互通寄存器25从第一存储器26中分批调取旋转链中的数据,是指状态旋转子电路22循环一次,互通寄存器25从第一存储器26中调取一批旋转链中的数据。状态旋转子电路22每次循环过程中,接收到的数据在旋转链中的位标不同,即,接收到的数据为旋转链中的不同数据。The intercommunication register 25 retrieves data in the rotation chain in batches from the first memory 26 , which means that the state rotation subcircuit 22 cycles once, and the intercommunication register 25 retrieves a batch of data in the rotation chain from the first memory 26 . During each cycle of the state rotation sub-circuit 22, the received data has a different index in the rotation chain, that is, the received data is different data in the rotation chain.
状态旋转子电路22对旋转链进行旋转,输出更新值后,输出子电路23需要接收状态旋转子电路22输出的更新值,以对状态旋转子电路22的输出进行变形处理,输出第二随机数。The state rotation subcircuit 22 rotates the rotation chain, and after outputting the update value, the output subcircuit 23 needs to receive the update value output by the state rotation subcircuit 22 to deform the output of the state rotation subcircuit 22 and output the second random number .
基于此,互通寄存器24还用将状态旋转子电路22的输出(更新值)传输至输出子电路23。Based on this, the interworking register 24 also transmits the output (updated value) of the state rotation subcircuit 22 to the output subcircuit 23 .
在生成第二随机数的过程中,状态旋转子电路22一边对旋转链中的数据(初始值或者更新值)进行旋转,输出子电路23一边对状态旋转子电路22输出的更新值进行变形处理,可同步完成。In the process of generating the second random number, the state rotation subcircuit 22 rotates the data (initial value or update value) in the rotation chain, and the output subcircuit 23 performs deformation processing on the update value output by the state rotation subcircuit 22 , the synchronization can be completed.
例如,状态旋转子电路22对第1轮的初始值进行旋转处理,输出第1轮更新值后。输出子电路23对第1轮的更新值进行变形处理,输出第二随机数。与此同时,状态旋转子电路22对第2轮的初始值进行旋转处理,输出第2轮更新值。对旋转链进行旋转的过程和对更新值进行处理的过程同步进行,可提升第二随机数的生成效率。For example, the state rotation subcircuit 22 performs rotation processing on the initial value of the first round, and outputs the updated value of the first round. The output subcircuit 23 performs deformation processing on the updated value of the first round, and outputs a second random number. At the same time, the state rotation subcircuit 22 performs rotation processing on the initial value of the second round, and outputs the updated value of the second round. The process of rotating the rotation chain and the process of processing the update value are performed synchronously, which can improve the generation efficiency of the second random number.
基于此,种子初始化发生子电路21与状态旋转子电路22之间的信号传输,状态旋转子电路22与输出子电路23之间的信号传输通过互通寄存器26来实现的,通过共用同一互通寄存器25,可减少互通寄存器25的数量。Based on this, the signal transmission between the seed initialization generation sub-circuit 21 and the state rotation sub-circuit 22, and the signal transmission between the state rotation sub-circuit 22 and the output sub-circuit 23 are realized through the intercommunication register 26, by sharing the same intercommunication register 25 , the number of intercommunication registers 25 can be reduced.
基于本示例提供的第二生成器20的结构,在本申请的一些实施例中,第二生成器20用于基于梅森旋转(mersenne twister,MT)算法,根据随机数种子同步生成多个第二随机数。Based on the structure of the second generator 20 provided in this example, in some embodiments of the present application, the second generator 20 is configured to synchronously generate multiple second generators according to random number seeds based on a Mersenne twister (MT) algorithm. random number.
在本申请的一些实施例中,梅森旋转算法的实现可以是MT19937算法。In some embodiments of the present application, the implementation of the Mersenne rotation algorithm may be the MT19937 algorithm.
示例的,如图11所示,第二生成器20包括种子初始化发生子电路21、状态旋转子电路22、输出子电路23、第三选择器24、互通寄存器25以及第一存储器26。Exemplarily, as shown in FIG. 11 , the second generator 20 includes a seed initialization generation subcircuit 21 , a state rotation subcircuit 22 , an output subcircuit 23 , a third selector 24 , an interworking register 25 and a first memory 26 .
种子初始化发生子电路21接收随机数种子key,第一右移器将随机数种子key右移30(即a=30)。第三异或器将第一右移器的输出与随机数种子key进行异或运算。第二乘法器将第三异或器的输出与69069(即,第三设定值=69069)相乘。第二加法器将第二乘法器的输出与s相加。第一与门器将第二加法器的输出与0xffffffff(即,第五设定值=0xffffffff)进行与运算,并将与运算的结果作为第一个初始值输出。第一个初始值的位标为0。The seed initialization generation subcircuit 21 receives the random number seed key, and the first right shifter right-shifts the random number seed key by 30 (ie, a=30). The third XOR performs XOR operation on the output of the first right shifter and the random number seed key. The second multiplier multiplies the output of the third XOR by 69069 (ie, the third set value=69069). The second adder adds the output of the second multiplier to s. The first AND gate performs an AND operation on the output of the second adder with 0xffffffff (ie, the fifth set value=0xffffffff), and outputs the result of the AND operation as the first initial value. The first initial value has a bit flag of 0.
其中,s为待形成的初始值在旋转链中的位标。待形成的初始值在旋转链中的位标为0,则s=0。Among them, s is the index of the initial value to be formed in the rotation chain. The position label of the initial value to be formed in the rotation chain is 0, then s=0.
然后,种子初始化发生子电路21接收前一次形成的初始值,第一右移器将将初始值右移30。第三异或器将第一右移器的输出与初始值进行异或运算。第二乘法器将第三异或器的输出与69069相乘。第二加法器将第二乘法器的输出与i相加。第一与门 器将第二加法器的输出与0xffffffff进行与运算,并将与运算的结果作为下一个初始值输出。Then, the seed initialization generating sub-circuit 21 receives the previously formed initial value, and the first right shifter will right-shift the initial value by 30. The third XOR XORs the output of the first right shifter with the initial value. The second multiplier multiplies the output of the third XOR by 69069. The second adder adds the output of the second multiplier to i. The first AND gate performs an AND operation on the output of the second adder with 0xffffffff, and outputs the result of the AND operation as the next initial value.
循环上述过程,得到包括624个初始值的旋转链。Loop the above process to get a rotating chain including 624 initial values.
第三选择器24在选择控制端control的控制下,将种子初始化发生子电路21的输出传输至互通寄存器25。The third selector 24 transmits the output of the seed initialization generating sub-circuit 21 to the intercommunication register 25 under the control of the selection control terminal control.
互通寄存器25接收第三选择器24的输出,将接收到的初始值传输至第一存储器26,第一存储器26接收并存储互通寄存器25输出的初始值,形成旋转链。The intercommunication register 25 receives the output of the third selector 24, and transmits the received initial value to the first memory 26, and the first memory 26 receives and stores the initial value output by the intercommunication register 25 to form a rotating chain.
互通寄存器25从第一存储器26中分批调取旋转链中的数据,传输至状态旋转子电路22。The intercommunication register 25 retrieves the data in the rotation chain in batches from the first memory 26 and transmits it to the state rotation sub-circuit 22 .
状态旋转子电路22包括16条并行的第二流水线(pipeline)221,每条第二pipeline221接收互通寄存器25调取的旋转链中的数据作为第五数据、第六数据以及第七数据。其中,第五数据为旋转链中的位标为i的数据,第六数据为旋转链中的位标为i+1的数据,第七数据为旋转链中的位标为(i+397)/624的余数的数据。多条并行的第二pipeline221接收到的第五数据、第六数据以及第七数据中i的取值不同。The state rotation sub-circuit 22 includes 16 parallel second pipelines 221, and each second pipeline 221 receives the data in the rotation chain retrieved by the interworking register 25 as the fifth data, the sixth data and the seventh data. Among them, the fifth data is the data with the bit label i in the rotation chain, the sixth data is the data with the bit label i+1 in the rotation chain, and the seventh data is the bit label in the rotation chain is (i+397) Data for the remainder of /624. The values of i in the fifth data, the sixth data and the seventh data received by the plurality of parallel second pipelines 221 are different.
示例的,如图12a所示,四条并行的第二pipeline221中,一条第二pipeline221接收的第五数据、第六数据以及第七数据依次为旋转链中位标为0、1、397的数据。一条第二pipeline221接收的第五数据、第六数据以及第七数据依次为旋转链中位标为1、2、398的数据。一条第二pipeline221接收的第五数据、第六数据以及第七数据依次为旋转链中位标为2、3、399的数据。一条第二pipeline221接收的第五数据、第六数据以及第七数据依次为旋转链中位标为3、4、400的数据。For example, as shown in FIG. 12a , among the four parallel second pipelines 221 , the fifth data, sixth data and seventh data received by one second pipeline 221 are the data marked 0, 1, and 397 in the rotation chain in sequence. The fifth data, the sixth data and the seventh data received by the second pipeline 221 are the data marked 1, 2, and 398 in the rotation chain in sequence. The fifth data, the sixth data and the seventh data received by the second pipeline 221 are the data marked 2, 3, and 399 in the rotation chain in sequence. The fifth data, the sixth data and the seventh data received by the second pipeline 221 are the data marked 3, 4, and 400 in the rotation chain in sequence.
每条第二pipeline221包括奇偶选择模块2211、奇数生成模块2212、偶数生成模块2213以及第二选择器2214。Each second pipeline 221 includes a parity selection module 2211 , an odd number generation module 2212 , an even number generation module 2213 and a second selector 2214 .
奇偶选择模块2211,接收旋转链中位标为i的数据(第五数据)和旋转链中位标为i+1的数据(第六数据),第二与门器将旋转链中位标为i的数据与0x7fffffff(即,第六设定值=0x7fffffff)进行与运算。第一取模器对旋转链中位标为i+1的数据取模。第三与门器对第一取模器的输出与0x80000000(即,第七设定值=0x80000000)进行与运算。第一或门器对第二与门器的输出和第三与门器的输出进行或运算。第二取模器对第一或门器的输出取模,并将取模的结果输出至第二选择器2214和偶数生成模块2213。The parity selection module 2211 receives the data marked i in the rotation chain (the fifth data) and the data marked i+1 in the rotation chain (the sixth data), and the second AND gate marks the middle position of the rotation chain as The data of i is ANDed with 0x7fffffff (ie, the sixth setting value=0x7fffffff). The first modulo takes the modulo of the data marked i+1 in the rotation chain. The third AND gater performs an AND operation on the output of the first modulo taker with 0x80000000 (ie, the seventh set value=0x80000000). The first OR gate performs an OR operation on the output of the second AND gate and the output of the third AND gate. The second modulo takes the modulo of the output of the first OR gate, and outputs the modulo result to the second selector 2214 and the even number generating module 2213 .
奇数生成模块2212,接收旋转链中位标为i的数据,第四异或器将旋转链中位标为i的数据与第八设定值进行异或,并将异或的结果输出至第二选择器2214。The odd number generation module 2212 receives the data marked i in the rotation chain, and the fourth XOR XORs the data marked i in the rotation chain with the eighth set value, and outputs the result of the XOR to the first XOR. Two selectors 2214.
偶数生成模块2213,用于接收旋转链中位标为(i+397)/624的余数的数据(第七数据),第二右移器将奇偶选择模块2211的输出右移1(即,b=1)位。第三取模器对旋转链中位标为(i+397)/624的余数的数据取模。第五异或器将第三取模器的输出和第二右移器的输出进行异或运算,并将异或的结果输出至第二选择器2214。The even number generation module 2213 is used to receive the data (seventh data) marked as the remainder of (i+397)/624 in the rotation chain, and the second right shifter right-shifts the output of the parity selection module 2211 by 1 (ie, b = 1) bit. The third modulo takes the modulo of the data in the rotation chain marked with the remainder of (i+397)/624. The fifth XOR performs an XOR operation on the output of the third modulo and the output of the second right shifter, and outputs the XOR result to the second selector 2214 .
如图12a所示,16条并行的第二pipeline221同步接收旋转链中的数据(初始值或者更新值),相邻两条第二pipeline221的第五数据在旋转链中的位标相差1。As shown in FIG. 12a , 16 parallel second pipelines 221 receive data (initial value or updated value) in the rotating chain synchronously, and the offsets of the fifth data of two adjacent second pipelines 221 in the rotating chain differ by 1.
状态旋转子电路22的16条并行的第二pipeline221循环39次,以完成对624个数据的旋转。The 16 parallel second pipelines 221 of the state rotation subcircuit 22 loop 39 times to complete the rotation of 624 pieces of data.
在这种情况下,考虑到第一存储器26中一次只能读取位于同一行中的数据,而第五数据和第七数据的位标相差较大,二者不位于同一行。In this case, considering that only the data located in the same row can be read from the first memory 26 at a time, and the offsets of the fifth data and the seventh data are quite different, they are not located in the same row.
如图12b所示,关于第一存储器26对旋转链中624个数据的存储的调取方式,第一存储器26包括第一缓冲寄存器(buffer)和第二缓冲寄存器,第一缓冲寄存器和第二缓冲寄存器分别对624个数据进行存储,在互通寄存器25从第一存储器26中调取数据时,第一缓冲寄存器和第二缓冲寄存器交替读写,以实现对16条并行的第二pipeline221提供数据。As shown in FIG. 12b , regarding the way of recalling the storage of 624 data in the rotation chain by the first memory 26, the first memory 26 includes a first buffer register (buffer) and a second buffer register, the first buffer register and the second buffer register The buffer registers store 624 pieces of data respectively. When the intercommunication register 25 retrieves data from the first memory 26, the first buffer register and the second buffer register alternately read and write, so as to provide data to 16 parallel second pipelines 221 .
示例的,采用将第一缓冲寄存器的第39行和第1行两拍攒起来的数据取位标为0、1-16这17个数据,第二缓冲寄存器的第25行和第26行两拍攒起来的数据取位标为397-412这16个数据,对16条并行的第二pipeline221进行并行计算。As an example, the data obtained by accumulating the 39th row and the 1st row of the first buffer register is taken and marked as 17 data of 0, 1-16, and the 25th row and the 26th row of the second buffer register. The 16 data of 397-412 are taken and saved, and the 16 parallel second pipeline221 are calculated in parallel.
第1条第二pipeline221的第五数据、第六数据以及第七数据分别依次对应旋转链中位标为0、1、397的数据。The fifth data, sixth data, and seventh data of the first and second pipeline221 correspond to the data marked 0, 1, and 397 in the rotation chain, respectively.
其中,(0+397)/624的余数为397,第七数据对应旋转链中位标为397的数据。Among them, the remainder of (0+397)/624 is 397, and the seventh data corresponds to the data marked as 397 in the rotation chain.
第2条第二pipeline221的第五数据、第六数据以及第七数据分别依次对应旋转链中位标为1、2、398的数据。The fifth data, sixth data and seventh data of the second second pipeline221 respectively correspond to the data marked 1, 2, and 398 in the rotation chain.
其中,(1+397)/624的余数为398,第七数据对应旋转链中位标为398的数据。Among them, the remainder of (1+397)/624 is 398, and the seventh data corresponds to the data marked as 398 in the rotation chain.
第3条第二pipeline221的第五数据、第六数据以及第七数据分别依次对应旋转链中位标为3、4、400的数据。The fifth data, sixth data and seventh data of the third second pipeline221 correspond to the data marked 3, 4, and 400 in the rotation chain in turn.
第5条第二pipeline221的第五数据、第六数据以及第七数据分别依次对应旋转链位中标为4、5、401的数据。The fifth data, sixth data and seventh data of the fifth second pipeline221 correspond to the data marked 4, 5, and 401 in the rotation chain bit in turn.
第6条第二pipeline221的第五数据、第六数据以及第七数据分别依次对应旋转链位中标为5、6、402的数据。The fifth data, sixth data and seventh data of the sixth second pipeline 221 correspond to the data marked 5, 6, and 402 in the rotation chain bit in turn.
第7条第二pipeline221的第五数据、第六数据以及第七数据分别依次对应旋转链中位标为6、7、403的数据。The fifth data, sixth data and seventh data of the seventh second pipeline221 respectively correspond to the data marked 6, 7, and 403 in the rotation chain in turn.
第8条第二pipeline221的第五数据、第六数据以及第七数据分别依次对应旋转链中位标为7、8、404的数据。The fifth, sixth, and seventh data of the eighth second pipeline 221 correspond to the data marked 7, 8, and 404 in the rotation chain in turn.
第9条第二pipeline221的第五数据、第六数据以及第七数据分别依次对应位标为8、9、405的数据。The fifth data, sixth data, and seventh data of the ninth second pipeline221 correspond to the data with the bit labels 8, 9, and 405 respectively.
第10条第二pipeline221的第五数据、第六数据以及第七数据分别依次对应旋转链中位标为9、10、406的数据。 Article 10 The fifth data, sixth data and seventh data of the second pipeline 221 respectively correspond to the data marked 9, 10, and 406 in the rotation chain.
第11条第二pipeline221的第五数据、第六数据以及第七数据分别依次对应旋转链中位标为10、11、407的数据。The fifth data, the sixth data and the seventh data of the 11th second pipeline221 respectively correspond to the data marked 10, 11, and 407 in the rotation chain.
第12条第二pipeline221的第五数据、第六数据以及第七数据分别依次对应旋转链中位标为11、12、408的数据。 Article 12 The fifth data, sixth data and seventh data of the second pipeline221 respectively correspond to the data marked 11, 12, and 408 in the rotation chain.
第13条第二pipeline221的第五数据、第六数据以及第七数据分别依次对应旋转链中位标为12、13、409的数据。 Article 13 The fifth data, sixth data, and seventh data of the second pipeline221 correspond to the data marked 12, 13, and 409 in the rotation chain, respectively.
第14条第二pipeline221的第五数据、第六数据以及第七数据分别依次对应旋转链中位标为13、14、410的数据。 Article 14 The fifth data, sixth data and seventh data of the second pipeline 221 respectively correspond to the data marked 13, 14, and 410 in the rotation chain.
第15条第二pipeline221的第五数据、第六数据以及第七数据分别依次对应旋转 链位中标为14、15、411的数据。 Article 15 The fifth data, sixth data and seventh data of the second pipeline221 correspond to the data marked 14, 15, and 411 in the rotation chain bit in turn.
第16条第二pipeline221的第五数据、第六数据以及第七数据分别依次对应旋转链中位标为15、16、412的数据。 Article 16 The fifth data, sixth data and seventh data of the second pipeline 221 respectively correspond to the data marked 15, 16, and 412 in the rotation chain.
16条第二pipeline221得到的16个数据结果一方面通过互通寄存器25写入输出子电路23,一方面通过互通寄存器25写入第一存储器26的第一缓冲寄存器和第二缓冲寄存器的第一行。依次循环,16条第二pipeline221每拍更新第一缓冲寄存器和第二缓冲寄存器中的16个数据。The 16 data results obtained by the 16 second pipeline 221 are written into the output sub-circuit 23 through the interworking register 25 on the one hand, and the first buffer register and the second buffer register of the first memory 26 are written through the interworking register 25 on the one hand. . In a cycle, the 16 second pipelines 221 update the 16 data in the first buffer register and the second buffer register every beat.
输出子电路23包括16条并行的输出线231,每条输出线231接收状态旋转子电路22中一条第二pipeline221的输出。The output subcircuit 23 includes 16 parallel output lines 231 , each output line 231 receiving the output of a second pipeline 221 in the state rotation subcircuit 22 .
输出线231中第三右移器将状态旋转子电路22的输出右移11(即,c=11)位。第六异或器将第三右移器的输出与状态旋转子电路22的输出进行异或运算。第一左移器将第六异或器的输出左移7(即,d=7)位。第四与门器将第一左移器的输出与0x9d2c5680(即,第九设定值=0x9d2c5680)进行与运算。第七异或器将第六异或器的输出与第四与门器的输出进行异或运算。第二左移器将第七异或器的输出左移15(即,e=15)位。第五与门器将第二左移器的输出与0xefc60000(即,第十设定值=0xefc60000)进行与运算。第八异或器将第五与门的输出与第七异或器的输出进行异或运算。第四右移器将第八异或器的输出右移11(即,f=11)位。第九异或器将第四右移器的输出与第八异或器的输出进行异或运算,将异或结果作为第二随机数输出。A third right shifter in output line 231 right shifts the output of state rotation subcircuit 22 by 11 (ie, c=11) bits. The sixth XOR performs an XOR operation on the output of the third right shifter and the output of the state rotation subcircuit 22 . The first left shifter shifts the output of the sixth XOR left by 7 (ie, d=7) bits. The fourth AND gater ANDs the output of the first left shifter with 0x9d2c5680 (ie, the ninth set value=0x9d2c5680). The seventh XOR performs an XOR operation on the output of the sixth XOR and the output of the fourth AND gate. The second left shifter shifts the output of the seventh XOR left by 15 (ie, e=15) bits. The fifth AND gate performs an AND operation on the output of the second left shifter with 0xefc60000 (ie, the tenth set value=0xefc60000). The eighth XOR performs an XOR operation on the output of the fifth AND gate and the output of the seventh XOR. The fourth right shifter right shifts the output of the eighth XOR by 11 (ie, f=11) bits. The ninth XOR performs an XOR operation on the output of the fourth right shifter and the output of the eighth XOR, and outputs the XOR result as a second random number.
如图13所示,关于随机数生成装置100中数据类型转换模块40、分布转换模块50以及输出控制模块60、中断管理模块的结构可以与示例一中相同,可参考示例一中的相关描述,此处不再赘述。As shown in FIG. 13 , the structures of the data type conversion module 40 , the distribution conversion module 50 , the output control module 60 , and the interrupt management module in the random number generation device 100 may be the same as those in Example 1, and the relevant description in Example 1 can be referred to, It will not be repeated here.
示例三Example three
示例三中的随机数生成装置100包括示例一中的第一生成器10和示例二中的第二生成器20。The random number generating apparatus 100 in Example 3 includes the first generator 10 in Example 1 and the second generator 20 in Example 2.
如图14a和图14b所示,随机数生成装置100包括第一生成器10、第二生成器20、第一选择器30、数据类型转换模块40、分布转换模块50以及输出控制模块60。As shown in FIGS. 14 a and 14 b , the random number generating apparatus 100 includes a first generator 10 , a second generator 20 , a first selector 30 , a data type conversion module 40 , a distribution conversion module 50 and an output control module 60 .
第一生成器10、数据类型转换模块40、分布转换模块50以及输出控制模块60的结构可以与示例一中第一生成器10、数据类型转换模块40、分布转换模块50以及输出控制模块60的结构相同,此处不再赘述。The structures of the first generator 10 , the data type conversion module 40 , the distribution conversion module 50 and the output control module 60 may be the same as those of the first generator 10 , the data type conversion module 40 , the distribution conversion module 50 and the output control module 60 in Example 1. The structure is the same and will not be repeated here.
其中,如图14a所示,第二生成器20可以包括多条输出线231。如图14b所示,第二生成器20可以包括一条输出线231。Wherein, as shown in FIG. 14a , the second generator 20 may include a plurality of output lines 231 . As shown in FIG. 14b , the second generator 20 may include an output line 231 .
第二生成器20的结构可以与示例二中第二生成器20的结构相同,可以参考示例二中的相关描述,此处不再赘述。The structure of the second generator 20 may be the same as the structure of the second generator 20 in the second example, and reference may be made to the relevant description in the second example, which will not be repeated here.
如图14a所示,第一生成器10输出的第一随机数和第二生成器20输出的第二随机数均传输至第一选择器30。第一选择器30用于根据第一参数,选择输出第一随机数或第二随机数。As shown in FIG. 14 a , the first random number output by the first generator 10 and the second random number output by the second generator 20 are both transmitted to the first selector 30 . The first selector 30 is configured to select and output the first random number or the second random number according to the first parameter.
第一参数例如可以通过系统调度器、CPU、GPU或者NPU传输至第一闪存,第一闪存传输至第一选择器30。For example, the first parameter can be transmitted to the first flash memory through the system scheduler, CPU, GPU or NPU, and the first flash memory is transmitted to the first selector 30 .
数据类型转换模块40用于对第一选择器30的输出进行数据类型转换。The data type conversion module 40 is configured to perform data type conversion on the output of the first selector 30 .
也就是说,在第一选择器30输出的第一随机数的情况下,数据类型转换模块40用于对第一随机数进行数据类型转换。在第一选择器30输出的第二随机数的情况下,数据类型转换模块40用于对第二随机数进行数据类型转换。That is, in the case of the first random number output by the first selector 30, the data type conversion module 40 is configured to perform data type conversion on the first random number. In the case of the second random number output by the first selector 30, the data type conversion module 40 is configured to perform data type conversion on the second random number.
本示例提供的随机数生成装置100包括第一生成器10和第二生成器20,第一生成器10生成第一随机数的原理和第二生成器20生成第二随机数的原理不同。因此,通过在随机数生成装置100中设置第一生成器10和第二生成器20,可根据需求选取通过第一生成器10生成随机数或者选取通过第二生成器20生成随机数,以适应不同的应用场景。The random number generating apparatus 100 provided in this example includes a first generator 10 and a second generator 20. The principle of the first generator 10 generating the first random number is different from the principle of the second generator 20 generating the second random number. Therefore, by arranging the first generator 10 and the second generator 20 in the random number generating apparatus 100, the random number can be generated by the first generator 10 or the random number can be generated by the second generator 20 according to the requirements, so as to adapt different application scenarios.
在本申请的一些实施例中,上述任一种随机数生成装置100,可以集成在芯片的基底上。In some embodiments of the present application, any of the above random number generating apparatuses 100 may be integrated on the substrate of the chip.
示例的,上述随机数生成装置100布局在AI芯片的片上系统(system on chip,SOC)侧。使得AI芯片可以应用在各种需要大规模随机数的AI推理和训练网络中。随机数生成装置100接收的随机数种子、参数、存放地址等通过软件配置完成。Illustratively, the above random number generating apparatus 100 is arranged on the system on chip (SOC) side of the AI chip. The AI chip can be used in various AI inference and training networks that require large-scale random numbers. The random number seed, parameters, storage address, etc. received by the random number generating apparatus 100 are configured through software.
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。The above are only specific embodiments of the present application, but the protection scope of the present application is not limited to this. should be covered within the scope of protection of this application. Therefore, the protection scope of the present application should be subject to the protection scope of the claims.

Claims (22)

  1. 一种随机数生成装置,其特征在于,包括:A device for generating random numbers, comprising:
    至少一个生成器,所述至少一个生成器中含有生成器用于根据随机数种子同步生成多个随机数;at least one generator, the at least one generator contains a generator for synchronously generating a plurality of random numbers according to the random number seed;
    在所述至少一个生成器为多个生成器的情况下,所述随机数生成装置还包括第一选择器;所述第一选择器用于根据第一参数,选择输出所述多个生成器中的一个生成的随机数。In the case where the at least one generator is multiple generators, the random number generating apparatus further includes a first selector; the first selector is configured to select and output one of the multiple generators according to the first parameter A generated random number.
  2. 根据权利要求1所述的随机数生成装置,其特征在于,所述至少一个生成器包括第一生成器,所述第一生成器包括多条并行的第一流水线(pipeline);所述多条并行的第一pipeline用于根据所述随机数种子同步生成所述多个随机数。The random number generation apparatus according to claim 1, wherein the at least one generator comprises a first generator, and the first generator comprises a plurality of parallel first pipelines; the plurality of The parallel first pipeline is used for synchronously generating the plurality of random numbers according to the random number seed.
  3. 根据权利要求2所述的随机数生成装置,其特征在于,所述第一pipeline包括多个级联的运算子电路;所述运算子电路包括多个并行的数据变形模块;The random number generation device according to claim 2, wherein the first pipeline comprises a plurality of cascaded operation subcircuits; the operation subcircuit comprises a plurality of parallel data deformation modules;
    所述数据变形模块用于接收第一数据、第二数据、第三数据、第一设定值以及第二设定值,将所述第一数据和所述第一设定值的积的低位作为第一输出数据;将所述第二数据和所述第三数据异或后的结果,与,所述第一数据和所述第一设定值的积的高位,异或后作为第二输出数据;The data deformation module is configured to receive the first data, the second data, the third data, the first set value and the second set value, and convert the lower bit of the product of the first data and the first set value As the first output data; XOR the result of the second data and the third data, and, the high bit of the product of the first data and the first set value, XOR as the second Output Data;
    所述数据变形模块还用于将所述第三数据与所述第二设定值之和作为第三输出数据;或者,除最后一级中所述数据变形模块外,每级中所述数据变形模块还用于将所述第三数据与所述第二设定值之和作为第三输出数据;The data deformation module is further configured to use the sum of the third data and the second set value as the third output data; or, except for the data deformation module in the last stage, the data in each stage The deformation module is further configured to use the sum of the third data and the second set value as the third output data;
    前一级中所述数据变形模块的所述第一输出数据、所述第二输出数据以及所述第三输出数据,分别作为后一级中所述数据变形模块的所述第二数据、所述第一数据以及所述第三数据;The first output data, the second output data and the third output data of the data deformation module in the previous stage are respectively used as the second data and the third output data of the data deformation module in the latter stage. the first data and the third data;
    其中,所述随机数种子的至少一个比特作为第一级所述运算子电路的所述第三数据;最后一级所述运算子电路的所述第一输出数据和所述第二输出数据作为所述第一pipeline生成的随机数。Wherein, at least one bit of the random number seed is used as the third data of the first stage of the operation subcircuit; the first output data and the second output data of the last stage of the operation subcircuit are used as The random number generated by the first pipeline.
  4. 根据权利要求1-3任一项所述的随机数生成装置,其特征在于,所述至少一个生成器还包括第二生成器;The random number generating apparatus according to any one of claims 1-3, wherein the at least one generator further comprises a second generator;
    所述第二生成器包括种子初始化发生子电路、状态旋转子电路以及输出子电路;The second generator includes a seed initialization generation subcircuit, a state rotation subcircuit and an output subcircuit;
    所述种子初始化发生子电路用于根据所述随机数种子进行初始化,生成包括多个初始值的旋转链;The seed initialization generating sub-circuit is used to initialize according to the random number seed to generate a rotation chain including a plurality of initial values;
    所述状态旋转子电路包括多条并行的第二流水线(pipeline),所述多条并行的第二pipeline用于对所述旋转链进行旋转;The state rotation subcircuit includes a plurality of parallel second pipelines, and the plurality of parallel second pipelines are used to rotate the rotation chain;
    所述输出子电路包括至少一条输出线,所述至少一条输出线用于对所述状态旋转子电路的输出进行变形处理,生成所述多个随机数。The output subcircuit includes at least one output line, and the at least one output line is used for deforming the output of the state rotation subcircuit to generate the plurality of random numbers.
  5. 根据权利要求4所述的随机数生成装置,其特征在于,所述种子初始化发生子电路用于接收第四数据、第三设定值、第四设定值以及第五设定值,将所述第四数据右移a位后和所述第四数据异或的结果,与,所述第三设定值,相乘得到的积;再与,所述第四设定值相加后,与,所述第五设定值进行与运算;将与运算的结果作为所述初始化值输出;The random number generating device according to claim 4, wherein the seed initialization generating sub-circuit is configured to receive the fourth data, the third setting value, the fourth setting value and the fifth setting value, and convert the After the fourth data is shifted to the right by a bit and the result of the XOR of the fourth data, and, the third set value, the product obtained by multiplying; and again, after the fourth set value is added, And, the fifth set value is AND operation; the result of the AND operation is output as the initialization value;
    其中,所述第四数据为所述随机数种子或者所述初始化值。Wherein, the fourth data is the random number seed or the initialization value.
  6. 根据权利要求4所述的随机数生成装置,其特征在于,所述第二pipeline包括奇偶选择模块、奇数生成模块、偶数生成模块以及第二选择器;The random number generation device according to claim 4, wherein the second pipeline comprises a parity selection module, an odd number generation module, an even number generation module and a second selector;
    所述奇偶选择模块,用于接收第五数据、第六数据、第六设定值以及第七设定值,将所述第五数据和所述第六设定值进行与运算的结果,与,所述第六数据取模后和所述第七设定值进行与运算的结果,进行或运算并取模,将取模的结果输出至所述第二选择器和所述偶数生成模块;The parity selection module is configured to receive the fifth data, the sixth data, the sixth set value and the seventh set value, and the result of performing the AND operation on the fifth data and the sixth set value, and , after the sixth data modulo is taken and the result of the seventh set value and operation, carry out an OR operation and take the modulo, and output the result of the modulo to the second selector and the even number generation module;
    所述奇数生成模块,用于接收所述第五数据和第八设定值,将所述第五数据与所述第八设定值进行异或后输出至所述第二选择器;The odd number generation module is configured to receive the fifth data and the eighth set value, and output the fifth data and the eighth set value to the second selector after XORing;
    所述偶数生成模块,用于接收第七数据和所述奇偶选择模块的输出,将所述奇偶选择模块的输出右移b位后,与,所述第七数据的取模结果,进行异或后输出至所述第二选择器;The even number generation module is used to receive the output of the seventh data and the parity selection module, and after the output of the parity selection module is right-shifted by b bits, and, the modulo result of the seventh data is XORed and then output to the second selector;
    所述第二选择器用于根据所述奇偶选择模块的输出,选择输出所述奇数生成模块或者所述偶数生成模块的输出,以对所述旋转链进行旋转;The second selector is configured to select and output the output of the odd number generation module or the even number generation module according to the output of the parity selection module, so as to rotate the rotation chain;
    其中,所述第五数据、所述第六数据以及所述第七数据为所述旋转链中的不同数据。Wherein, the fifth data, the sixth data and the seventh data are different data in the rotating chain.
  7. 根据权利要求4所述的随机数生成装置,其特征在于,所述输出线用于接收所述状态旋转子电路的输出、第九设定值以及第十设定值,将所述状态旋转子电路的输出右移c位后与所述状态旋转子电路的输出进行异或;然后将异或结果,与,异或结果左移d位后和所述第九设定值与运算的结果,进行异或;然后将异或结果,与异或结果左移e位后和所述第十设定值与运算的结果,进行异或;然后将异或结果,与,异或结果右移f位的结果,进行异或后作为随机数输出。The random number generating device according to claim 4, wherein the output line is used to receive the output of the state rotator circuit, the ninth set value and the tenth set value, and the state rotator After the output of the circuit is shifted to the right by c bits, XOR is performed with the output of the state rotation sub-circuit; then the XOR result, the AND, the XOR result is left shifted by d bits and the result of the ninth set value AND operation, XOR is performed; then the XOR result and the XOR result are left shifted by e bits and the result of the tenth set value AND operation is XORed; then the XOR result, and, the XOR result is shifted to the right by f The bit result is XORed and output as a random number.
  8. 根据权利要求4-7任一项所述的随机数生成装置,其特征在于,所述第二生成器还包括第三选择器、互通寄存器以及第一存储器;The random number generating device according to any one of claims 4-7, wherein the second generator further comprises a third selector, an interworking register and a first memory;
    所述第三选择器,用于接收所述种子初始化发生子电路输出的旋转前的所述旋转链和所述状态旋转子电路输出的旋转后的所述旋转链,在选择控制端的控制下,将旋转前的所述旋转链或旋转后的所述旋转链传输至所述互通寄存器;The third selector is configured to receive the rotation chain before the rotation output by the seed initialization generation subcircuit and the rotation chain after the rotation output by the state rotation subcircuit, under the control of the selection control terminal, transmitting the rotation chain before rotation or the rotation chain after rotation to the intercommunication register;
    所述互通寄存器,用于接收所述第三选择器输出的所述旋转链,将所述旋转链传输至所述第一存储器,并从所述第一存储器中分批调取所述旋转链中的数据,作为第五数据、第六数据以及第七数据传输至所述第二pipeline;还用于将所述状态旋转子电路的输出传输至所述输出子电路;the intercommunication register for receiving the rotation chain output by the third selector, transmitting the rotation chain to the first memory, and retrieving the rotation chain from the first memory in batches The data in , are transmitted to the second pipeline as fifth data, sixth data and seventh data; also used to transmit the output of the state rotation sub-circuit to the output sub-circuit;
    所述第一存储器用于接收并存储所述互通寄存器输出的所述旋转链。The first memory is used to receive and store the rotation chain output by the interworking register.
  9. 根据权利要求1-8任一项所述的随机数生成装置,其特征在于,所述随机数生成装置还包括:数据类型转换模块,用于对所述至少一个生成器生成的随机数进行数据类型转换。The random number generating device according to any one of claims 1-8, wherein the random number generating device further comprises: a data type conversion module, configured to perform data processing on the random number generated by the at least one generator Type conversion.
  10. 根据权利要求9所述的随机数生成装置,其特征在于,所述数据类型转换模块包括至少一个数据类型转换器;所述数据类型转换器用于将所述至少一个生成器生成的随机数转化为预设数据类型的随机数;The random number generation device according to claim 9, wherein the data type conversion module comprises at least one data type converter; the data type converter is used to convert the random number generated by the at least one generator into Random numbers of preset data types;
    在所述数据类型转换模块包括多个数据类型转换器的情况下,所述数据类型转换 模块还包括第四选择器,所述第四选择器用于根据第二参数,选择对所述多个数据类型转换器中的一个的结果进行输出;In the case where the data type conversion module includes multiple data type converters, the data type conversion module further includes a fourth selector, and the fourth selector is configured to select, according to the second parameter, the data for the multiple data types. The result of one of the type converters is output;
    其中,所述多个数据类型转换器转换得到的随机数的预设数据类型不同。The preset data types of the random numbers converted by the plurality of data type converters are different.
  11. 根据权利要求9所述的随机数生成装置,其特征在于,所述随机数生成装置还包括:分布转换模块,用于对所述数据类型转换模块的输出进行分布类型的转换。The random number generation device according to claim 9, wherein the random number generation device further comprises: a distribution conversion module, configured to perform distribution type conversion on the output of the data type conversion module.
  12. 根据权利要求11所述的随机数生成装置,其特征在于,所述分布转换模块包括至少一个分布生成器;所述分布生成器用于将所述数据类型转换模块输出的随机数转化为服从预设分布的随机数;The random number generation device according to claim 11, wherein the distribution conversion module includes at least one distribution generator; the distribution generator is configured to convert the random number output by the data type conversion module into a random number that obeys a preset distributed random numbers;
    在所述分布转换模块包括多个分布生成器的情况下,所述随机数生成装置还包括第五选择器,所述第五选择器用于根据第三参数,选择对所述多个分布生成器中的一个的结果进行输出;In the case where the distribution conversion module includes multiple distribution generators, the random number generation device further includes a fifth selector, and the fifth selector is configured to select the multiple distribution generators according to the third parameter. The result of one of the output is output;
    其中,所述多个分布生成器转换得到的随机数服从的预设分布类型不同。Wherein, the random numbers converted by the multiple distribution generators obey different preset distribution types.
  13. 根据权利要求12所述的随机数生成装置,其特征在于,所述至少一个分布生成器包括正态分布器,所述正态分布器采用箱式-穆勒(box-muller)算法将所述数据类型转换模块输出的随机数转化为服从正态分布的随机数。The random number generating apparatus according to claim 12, wherein the at least one distribution generator comprises a normal distributor, and the normal distributor adopts a box-muller algorithm to divide the The random numbers output by the data type conversion module are converted into random numbers that obey the normal distribution.
  14. 一种随机数生成方法,其特征在于,包括:A method for generating random numbers, comprising:
    生成器根据随机数种子同步生成多个随机数;The generator generates multiple random numbers synchronously according to the random number seed;
    在具有多个生成器生成所述多个随机数的情况下,所述随机数生成方法还包括第一选择器根据第一参数,选择输出所述多个生成器中的一个生成的所述多个随机数。In the case where there are multiple generators to generate the multiple random numbers, the random number generation method further includes that the first selector selects and outputs the multiple generated by one of the multiple generators according to the first parameter. a random number.
  15. 根据权利要求14所述的随机数生成方法,其特征在于,生成器根据随机数种子同步生成多个随机数,包括:The random number generation method according to claim 14, wherein the generator generates a plurality of random numbers synchronously according to the random number seed, comprising:
    第一生成器中多条并行的第一流水线(pipeline)根据所述随机数种子同步生成所述多个随机数。A plurality of parallel first pipelines (pipelines) in the first generator synchronously generate the plurality of random numbers according to the random number seed.
  16. 根据权利要求14所述的随机数生成方法,其特征在于,生成器根据随机数种子同步生成多个随机数,包括:The random number generation method according to claim 14, wherein the generator generates a plurality of random numbers synchronously according to the random number seed, comprising:
    种子初始化发生子电路根据所述随机数种子进行初始化,生成包括多个初始值的旋转链;The seed initialization generating subcircuit performs initialization according to the random number seed, and generates a rotation chain including a plurality of initial values;
    状态旋转子电路中多条并行的第二流水线(pipeline),对所述旋转链进行旋转;a plurality of parallel second pipelines in the state rotation subcircuit to rotate the rotation chain;
    输出子电路中多条并行的输出线对所述状态旋转子电路的输出进行变形处理,同步生成所述多个随机数。A plurality of parallel output lines in the output subcircuit perform deformation processing on the output of the state rotation subcircuit to generate the plurality of random numbers synchronously.
  17. 根据权利要求14-16任一项所述的随机数生成方法,其特征在于,所述随机数生成方法还包括:数据类型转换模块对所述生成器生成的所述多个随机数进行数据类型转换。The random number generation method according to any one of claims 14-16, wherein the random number generation method further comprises: a data type conversion module performs a data type conversion on the plurality of random numbers generated by the generator convert.
  18. 根据权利要求17所述的随机数生成方法,其特征在于,所述随机数生成方法还包括:分布转换模块对所述数据类型转换模块的输出进行分布类型的转换。The random number generation method according to claim 17, wherein the random number generation method further comprises: the distribution conversion module performs distribution type conversion on the output of the data type conversion module.
  19. 一种神经网络系统,其特征在于,包括第二存储器和权利要求1-13任一项所述随机数生成装置,所述第二存储器用于存储所述随机数生成装置生成的随机数。A neural network system, characterized by comprising a second memory and the random number generating device according to any one of claims 1-13, wherein the second memory is used for storing random numbers generated by the random number generating device.
  20. 根据权利要求19所述的神经网络系统,其特征在于,所述第二存储器还用于存储所述随机数生成装置生成的最后一个所述随机数的位标。The neural network system according to claim 19, wherein the second memory is further configured to store the last index of the random number generated by the random number generating device.
  21. 一种芯片,其特征在于,包括基底和权利要求1-13任一项所述随机数生成装置,所述随机数生成装置设置在所述基底上。A chip, characterized by comprising a substrate and the random number generating device according to any one of claims 1-13, wherein the random number generating device is arranged on the substrate.
  22. 一种随机数生成装置,其特征在于,包括:第二生成器;A device for generating random numbers, comprising: a second generator;
    所述第二生成器,包括种子初始化发生子电路、状态旋转子电路以及输出子电路;The second generator includes a seed initialization generation subcircuit, a state rotation subcircuit and an output subcircuit;
    所述种子初始化发生子电路用于根据随机数种子进行初始化,生成包括多个初始值的旋转链;The seed initialization generating subcircuit is used to initialize according to the random number seed, and generate a rotation chain including a plurality of initial values;
    所述状态旋转子电路包括多条并行的第二流水线(pipeline),所述多条并行的第二pipeline用于对所述旋转链进行旋转;The state rotation subcircuit includes a plurality of parallel second pipelines, and the plurality of parallel second pipelines are used to rotate the rotation chain;
    所述输出子电路包括输出线,所述输出线用于对所述状态旋转子电路的输出进行变形处理,生成随机数。The output subcircuit includes an output line, and the output line is used for deforming the output of the state rotation subcircuit to generate a random number.
PCT/CN2021/083344 2021-03-26 2021-03-26 Random number generation apparatus and method, random number generation system, and chip WO2022198652A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202180087448.XA CN116710890A (en) 2021-03-26 2021-03-26 Random number generation device and generation method, random number generation system, and chip
PCT/CN2021/083344 WO2022198652A1 (en) 2021-03-26 2021-03-26 Random number generation apparatus and method, random number generation system, and chip

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/083344 WO2022198652A1 (en) 2021-03-26 2021-03-26 Random number generation apparatus and method, random number generation system, and chip

Publications (1)

Publication Number Publication Date
WO2022198652A1 true WO2022198652A1 (en) 2022-09-29

Family

ID=83395092

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/083344 WO2022198652A1 (en) 2021-03-26 2021-03-26 Random number generation apparatus and method, random number generation system, and chip

Country Status (2)

Country Link
CN (1) CN116710890A (en)
WO (1) WO2022198652A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117390610A (en) * 2023-12-13 2024-01-12 中国人民解放军国防科技大学 Identity identification generation method, system and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN204883682U (en) * 2015-08-12 2015-12-16 中国电子科技集团公司第四十一研究所 Multichannel pseudo -random signal generator
CN109615370A (en) * 2018-10-25 2019-04-12 阿里巴巴集团控股有限公司 Object select method and device, electronic equipment
CN110058843A (en) * 2019-03-27 2019-07-26 阿里巴巴集团控股有限公司 Generation method, device and the server of pseudo random number
CN112328206A (en) * 2020-11-03 2021-02-05 广州科泽云天智能科技有限公司 Parallel random number generation method for vectorization component

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN204883682U (en) * 2015-08-12 2015-12-16 中国电子科技集团公司第四十一研究所 Multichannel pseudo -random signal generator
CN109615370A (en) * 2018-10-25 2019-04-12 阿里巴巴集团控股有限公司 Object select method and device, electronic equipment
CN110058843A (en) * 2019-03-27 2019-07-26 阿里巴巴集团控股有限公司 Generation method, device and the server of pseudo random number
CN112328206A (en) * 2020-11-03 2021-02-05 广州科泽云天智能科技有限公司 Parallel random number generation method for vectorization component

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
JOHN K. SALMON ; MARK A. MORAES ; RON O. DROR ; DAVID E. SHAW: "Parallel random numbers: As easy as 1, 2, 3", HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SC), 2011 INTERNATIONAL CONFERENCE FOR, IEEE, 12 November 2011 (2011-11-12), pages 1 - 12, XP032081465, ISBN: 978-1-4503-0771-0, DOI: 10.1145/2063384.2063405 *
XIANG TIAN ; KHALED BENKRID: "Mersenne Twister Random Number Generation on FPGA, CPU and GPU", ADAPTIVE HARDWARE AND SYSTEMS, 2009. AHS 2009. NASA/ESA CONFERENCE ON, IEEE, PISCATAWAY, NJ, USA, 29 July 2009 (2009-07-29), Piscataway, NJ, USA , pages 460 - 464, XP031563050, ISBN: 978-0-7695-3714-6 *

Also Published As

Publication number Publication date
CN116710890A (en) 2023-09-05

Similar Documents

Publication Publication Date Title
TW202046181A (en) Method for performing matrix multiplication, hardware circuit, compupation system, and computer storage medium
Le et al. Parallel AES algorithm for fast data encryption on GPU
JP2021508125A (en) Matrix multiplier
CN108139889B (en) Generation of pseudo-random number sequences by non-linear mixing of a plurality of auxiliary pseudo-random number generators
CN104820657A (en) Inter-core communication method and parallel programming model based on embedded heterogeneous multi-core processor
CN112114776A (en) Quantum multiplication method and device, electronic device and storage medium
TW201617960A (en) SM4 acceleration processors, methods, systems, and instructions
Nagendra et al. Performance improvement of advanced encryption algorithm using parallel computation
CN108269226B (en) Apparatus and method for processing sparse data
WO2022198652A1 (en) Random number generation apparatus and method, random number generation system, and chip
US11922133B2 (en) Processor and method for processing mask data
US20220156734A1 (en) Information processing device
US20150095389A1 (en) Method and system for generating pseudorandom numbers in parallel
CN111767512A (en) Discrete cosine transform/inverse discrete cosine transform DCT/IDCT system and method
Pham et al. High performance multicore SHA-256 accelerator using fully parallel computation and local memory
Wan et al. TESLAC: accelerating lattice-based cryptography with AI accelerator
CN112328401B (en) 3DES acceleration method based on OpenCL and FPGA
Wu et al. A fast GPU-based implementation for MD5 hash reverse
Xue et al. GB-RC4: Effective brute force attacks on RC4 algorithm using GPU
CN107193536B (en) Packet processing method and system for multidimensional dynamic data
Stefan Analysis and implementation of eSTREAM and SHA-3 cryptographic algorithms
Suciu et al. Statistical testing of random number sequences using CUDA
US20240070223A1 (en) Increased computation efficiency with multi-stage 8-bit floating point matrix multiplication with format conversion
US20220103345A1 (en) Methods and apparatus to hash data
US20240113863A1 (en) Efficient implementation of zuc authentication

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21932273

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 202180087448.X

Country of ref document: CN

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21932273

Country of ref document: EP

Kind code of ref document: A1