CN115437603A

CN115437603A - Method for generating random numbers and related products

Info

Publication number: CN115437603A
Application number: CN202110622813.4A
Authority: CN
Inventors: 不公告发明人
Original assignee: Cambricon Technologies Corp Ltd
Current assignee: Cambricon Technologies Corp Ltd
Priority date: 2021-06-04
Filing date: 2021-06-04
Publication date: 2022-12-06
Anticipated expiration: 2041-06-04
Also published as: WO2022253287A1; CN115437603B

Abstract

The disclosure discloses an apparatus for generating random numbers, an integrated circuit chip, a board card, an electronic device, and a method for generating random numbers. Wherein the aforementioned means may be comprised in a combined processing means, which may further comprise interface means and other processing means. The computing device interacts with other processing devices to jointly complete computing operations specified by a user. The combined processing means may further comprise storage means connected to the device and the other processing means, respectively, for storing data of the device and the other processing means. The scheme disclosed by the invention can improve the generation efficiency of the random number and increase the pipelining performance of hardware execution.

Description

Method for generating random numbers and related products

Technical Field

The present disclosure relates generally to the field of random numbers. More particularly, the present disclosure relates to an apparatus, an integrated circuit chip, a board card, an electronic device, and a method for generating a random number.

Background

Random numbers have a wide application base in many scenarios like statistical applications, experimental tests, etc. As the amount of data to be counted or tested increases, higher demands are also made on the size of the amount of data to generate random numbers and the generation efficiency. Generally, a large number of data iteration operations are involved in the random number generation algorithm, which requires that the hardware architecture for generating the random numbers can adapt to such operation requirements. However, in the existing random number generation manner, the hardware architecture does not support the pipelining operation and thus the random number generation efficiency is relatively low.

Disclosure of Invention

To address at least the above-identified problems in the prior art, the present disclosure provides a scheme for generating random numbers in a pipelined manner. Aspects of the present disclosure may achieve technical advantages in a number of respects, including enhancing processing performance of hardware, reducing power consumption, improving execution efficiency of computing operations, and avoiding computational overhead.

In a first aspect, the present disclosure provides an apparatus for generating random numbers, comprising: an instruction decode unit configured to receive a random number instruction and decode the random number instruction; an arithmetic unit configured to perform a random number generation operation based on a single seed or a plurality of seeds in accordance with the decoded random number instruction, wherein for each of the seeds the random number generation operation comprises a generation operation and an update operation performed in a pipelined manner, wherein the generation operation is used to generate random numbers and the update operation is used to update state vectors; and a memory configured to set a state space according to the decoded random number instruction, wherein the state space is configured to store a state vector for generating the random number and to accept a state vector update to the state space from an arithmetic unit, wherein the state space is sized to: supporting execution of the generating operation and the updating operation in a pipelined manner under a single seed; or under a plurality of seeds, the random number generation operation of each seed is executed in a pipelining mode.

In a second aspect, the present disclosure provides an integrated circuit chip comprising an apparatus for generating random numbers as described above and in a number of embodiments below.

In a third aspect, the present disclosure provides a board card comprising an integrated circuit chip as described above and in the following embodiments.

In a fourth aspect, the present disclosure provides an electronic device comprising an integrated circuit chip as described above and as will be described in a number of embodiments below.

In a fifth aspect, the present disclosure provides a method for generating random numbers, comprising: receiving a random number instruction and decoding the random number instruction; setting a state space according to the decoded random number instruction, wherein the state space is configured to store a state vector used to generate the random number and to accept state vector updates to the state space; and performing a random number generation operation based on a single seed or a plurality of seeds in accordance with the decoded random number instruction, wherein for each of the seeds the random number generation operation comprises a generation operation and an update operation performed in a pipelined manner, wherein the generation operation is used to generate random numbers and the update operation is used to update a state vector, wherein the size of the state space is set to: supporting, under a single seed, execution of the generating operation and the updating operation in a pipelined manner; or under a plurality of seeds, the random number generation operation of each seed is executed in a pipelining mode.

By utilizing the device, the integrated circuit chip, the board card, the electronic device and the method for generating the random number disclosed by the invention, the generation operation of the random number can be executed in a pipeline manner by hardware. Thus, the disclosed scheme may efficiently generate random numbers by means of pipelining, thereby improving the overall performance of the hardware and reducing computational overhead. Further, the scheme disclosed by the invention not only supports the random number generation operation under a single seed, but also supports the random number generation operation under a plurality of seeds, so that the generation mode of the random number is more flexible and more efficient. In particular, in the random number generation operation of various seeds, the pipeline operation performance of hardware is further improved.

Drawings

The above and other objects, features and advantages of exemplary embodiments of the present disclosure will become readily apparent from the following detailed description read in conjunction with the accompanying drawings. In the drawings, several embodiments of the disclosure are illustrated by way of example and not by way of limitation, and like or corresponding reference numerals indicate like or corresponding parts and in which:

FIG. 1 is a block diagram illustrating an apparatus for generating random numbers according to an embodiment of the present disclosure;

FIG. 2 is a schematic diagram illustrating an initialization process for generating random numbers according to one embodiment of the present disclosure;

FIG. 3 is a schematic diagram illustrating an initialization process for generating random numbers according to yet another embodiment of the present disclosure;

FIG. 4 is a diagram illustrating state space variations for generating random numbers according to one embodiment of the present disclosure;

FIG. 5a is a schematic diagram illustrating a partial state space variation for generating random numbers according to yet another embodiment of the present disclosure;

FIG. 5b is a schematic diagram illustrating another portion of the state space variation for generating random numbers according to yet another embodiment of the present disclosure;

FIG. 6 is a flow chart illustrating a method for generating random numbers in accordance with an embodiment of the present disclosure;

FIG. 7 is a flow chart illustrating a method of generating random numbers in a single submode according to an embodiment of the present disclosure;

FIG. 8 is a flow chart illustrating a method of generating random numbers in a dual seed mode according to an embodiment of the present disclosure;

FIG. 9 is a block diagram illustrating a combined treatment device according to an embodiment of the present disclosure; and

fig. 10 is a schematic diagram illustrating a structure of a board according to an embodiment of the disclosure.

Detailed Description

The present disclosure provides a scheme for efficiently generating random numbers in a parallel pipelined or ping-pong manner. To this end, in one embodiment, the present disclosure proposes to efficiently set the state space storing the state vector used to generate the random number such that the state space meets the hardware requirements in a pipelined operation. In particular, in a scenario in which a random number is generated using a single seed and a plurality of seeds, the present disclosure proposes to expand a state space so as to support generation operations and update operations in generating a random number in a pipelined manner in the foregoing two scenarios. In combination with the improvement of the state space, the solution of the present disclosure further proposes to perform the foregoing generating operation and updating operation by using the random number instruction of the plurality of microinstructions obtained after parsing. By means of the hardware settings and random number instructions of the present disclosure, random numbers may be generated efficiently, thereby increasing the overall performance of the computing system and reducing the overhead of generating random numbers.

The technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings. It is to be understood that the embodiments described below are only a few embodiments of the present disclosure, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the scope of protection of the present disclosure.

Fig. 1 is a block diagram illustrating an apparatus 100 for generating random numbers according to an embodiment of the present disclosure. As shown in fig. 1, the apparatus 100 of the present disclosure includes an instruction decoding unit 101, an arithmetic unit 102 and a memory 103, which interact and cooperate with each other to generate random numbers in a pipelined manner. In one implementation scenario, the instruction decode unit may be configured to receive a random number instruction and decode the random number instruction. In one embodiment, the random number instructions of the present disclosure may include one or more of the following depending on the implementation scenario: the number of generated random numbers, the size and address of the state space, one or more seed information for generating random numbers, a concatenation parameter for concatenating random numbers, an address of an output random number, and the like. In one embodiment, the random number instruction may be a single instruction and results in multiple or more microinstructions after being parsed by the instruction decode unit. In another embodiment, the seed information may include a specific seed (e.g., an initial value 32 bits long) or information indicating whether the seed is from a special function register. Regarding the aforementioned splicing parameters for splicing random numbers, in case two seeds are applied, it may include a first address for storing random numbers associated with the first seed and an address offset (i.e. a span of storage addresses after all random numbers are generated using the first seed). Thereby, the arithmetic unit may determine a first address for storing the random number associated with the second seed, i.e. the first address of the aforementioned first seed plus its address offset.

In one implementation scenario, the arithmetic unit may be configured to perform a single-seed or multiple-seed based random number generation operation in accordance with the decoded random number instruction. In particular, the random number generation operations of the present disclosure may include, for each seed, a generation operation and an update operation that are performed in a pipelined manner. Here, the generating operation may be used to generate a random number and the updating operation may be used to update the state vector. The arithmetic unit of the present disclosure may have different implementations according to different application scenarios. When applied to hardware architectures in the field of artificial intelligence, the arithmetic unit of the present disclosure may be implemented as an artificial intelligence processor or a computational core in a processor.

In one implementation scenario, the memory may be configured to set the state space according to a decoded random number instruction. In the context of the present disclosure, the state space may be configured to store a state vector for generating random numbers and to accept state vector updates to the state space from the arithmetic unit. Further, the state space of the present disclosure may be sized such that, under a single seed, execution of the generating and updating operations in a pipelined manner is supported; or under a plurality of seeds, the random number generation operation of each seed is executed in a pipelining mode. In one embodiment, the size of the state space of the present disclosure may be configured by software. The configuration of the size of the configuration state space is performed through software, so that the setting of the size of the state space can be more flexible and convenient, and the state space can better support the pipelining operation of the random number generation scheme under the single seed mode and the multiple seed modes.

In generating the random number, the arithmetic unit may read at least one state vector from a state space (which may be regarded as one cyclically utilized storage space) to generate the random number, and may generate an updated state vector by an update operation based on the at least one state vector. Further, the state space may be updated with the updated state vector. The random number generation operation of the present disclosure will be described in detail below in conjunction with the aforementioned random number instructions.

As previously described, the random number instructions of the present disclosure may be implemented as a single instruction and, upon parsing by an instruction decode unit, may result in multiple microinstructions for performing pipelined operations. According to the plurality of microinstructions, the arithmetic unit may be configured to perform a generation operation of generating a random number and an update operation of updating the state space along the state space in a cyclically shifted manner. Specifically, according to the microinstruction, the arithmetic unit may perform a generation operation of a random number using a state vector stored on one of a plurality of pieces of space sections divided in the state space to generate a predetermined number of random numbers. Next, a new state vector may be generated using the state vectors stored on the aforementioned spatial segments, and the new state vector may be stored on another one of the plurality of spatial segments of the state space. In order to achieve pipelining and reduce address dependency between data, the present disclosure proposes to arrange the aforementioned two spatial sections far enough apart, and to perform the random number generation operation with a different section from the aforementioned two spatial sections immediately after the update of the state vector, thereby minimizing the latency of the arithmetic unit and thus improving the arithmetic efficiency of the arithmetic unit in generating random numbers.

In one embodiment, the state space for random number generation may be implemented as a recycled memory space. Thus, the plurality of space segments described above are variably operated in the read space segment or the write space segment according to the generation operation and the update operation to be performed. In other words, as the iterative loop of generate and update operations is performed, the space segment previously used for reading is converted to a write segment in a subsequent operation for writing of a new state vector, i.e., performing the update of the state space of the present disclosure. Further, in order to realize that the generation operation of the random numbers and the update operation of the state vectors are performed in a pipelined manner, the aforementioned read space section and write space section may be set to have a predetermined address interval in the state space to minimize address dependency between the state vectors in the subsequent generation operation and update operation.

In one implementation scenario, to enable continuous invocation and execution of random number instructions, the instruction decode unit of the present disclosure may be configured to output to a state space in memory a state pointer that may indicate a location in the state space of a first state vector used to generate an operation (e.g., coordinates in the state space of Y [ i + N ] described later in connection with the figures) and a location in the state space of a first update state vector used to update an operation (e.g., coordinates in the state space of X [ i ] described later in connection with the figures). Based on the state pointer, the arithmetic unit can know the position of the first state vector and the first updated state vector in the state space after the random number instruction is executed last time, so that the generation operation and the update operation of the next random number instruction can be executed from the position.

In one implementation scenario, the state space of the present disclosure may include N contiguously distributed read space segments and N contiguously distributed write space segments, and the N read space segments and the N write space segments form a correspondence in a pipelined operation, where N is a positive integer greater than or equal to 2. Based on the aforementioned preset conditions (e.g. configured by a random number instruction), during the pipelining operation, the arithmetic unit may be configured to perform the following operations in a loop according to the number of pre-generated random numbers, where i =1, \8230:

in the generation operation for generating random numbers, the arithmetic unit may read a plurality of state vectors from the ith read space section to generate a corresponding ith plurality of random numbers; and

in the update operation for updating the state space, the arithmetic unit may generate a plurality of state update vectors for updating the state space from the plurality of state vectors in the ith read space section and write the plurality of state update vectors to the ith write space section.

As previously described, the arithmetic unit of the present disclosure may generate a predetermined number of random numbers according to a random number instruction. When the random number instruction is resolved into multiple microinstructions (including one or more generate and update microinstructions) by the instruction decode unit, then the arithmetic unit may divide the fractional number of random numbers into several secondary fractional numbers and update the fractional number of state vectors according to different microinstruction combinations until a desired number of random numbers is ultimately generated. Specifically, the arithmetic unit of the present disclosure may generate a partial number of random numbers from one generation microinstruction and update a partial number of state vectors from one update microinstruction. Alternatively, the arithmetic unit may generate a partial number of random numbers from one generation microinstruction and update partial number of state vectors from a plurality of update microinstructions, respectively. Additionally, the arithmetic unit may generate a partial number of random numbers respectively according to a plurality of generation microinstructions and update the partial number of state vectors according to one update microinstruction. The use of combinations of different microinstructions herein is merely exemplary and general and will be described in greater detail below with reference to the drawings.

To minimize overlap in addresses of read operations (e.g., reading a state vector from a state space to generate a random number) and write operations (e.g., writing a new state vector to the state space to update the state space) of a microinstruction, the present disclosure proposes to include an address range entry in each microinstruction that indicates a read operation or a write operation for the state space, and for two consecutive microinstructions for which there is an address overlap, the arithmetic unit will be configured to perform the read operation or the write operation of the next microinstruction only after the read operation or the write operation of one of the microinstructions has been performed.

In one embodiment, during initialization of the state space, the present disclosure proposes to divide the state space into a first state subspace and a second state subspace of equal size. In this case, the arithmetic unit may be configured to perform an update operation for the second state subspace using the state vectors stored in the first state subspace to generate all state vectors for the second state subspace. When the number of state vectors in the first state subspace of the state space is represented by the aforementioned N (in this case, N represents N consecutive segments in the first state subspace, each segment including one state vector), only the update operation of the state space is performed to generate new N state vectors by using 0 to N-1 total N state vectors in the first state subspace, so that the state space including 0 to 2N-1 total N state vectors can be obtained. In generating all state vectors of the second state subspace, the arithmetic unit may be further configured to sequentially execute a plurality of microinstructions to generate all state vectors (i.e., N state vectors from N2N-1 in the previous example) for updating the second state subspace, wherein execution of each microinstruction generates a corresponding number of state vectors.

As previously described, aspects of the present disclosure provide a random number generation scheme for a single submode and multiple submodes. Under a single seed, the arithmetic unit of the present disclosure may be configured to perform the aforementioned generation operation to generate the random number using the state vector generated by the single seed, and perform the aforementioned update operation to update the relevant state vector of the state space. In particular, in the case of a single seed, the arithmetic unit may be configured to successively alternate the generation operation of generating the random number and the update operation for updating the state space according to the plurality of microinstructions until a predetermined number of random numbers are generated. In one implementation scenario, a partial number of random numbers are generated per generation operation and a corresponding number of state vectors in the state space are updated per update operation.

Alternatively or additionally, in case of multiple seeds, each seed may have its associated state space, i.e. its state space is set for each seed. In this case, the arithmetic unit may be configured to perform a generating operation associated with each seed to generate the random number associated with each seed using the state vectors generated by each of the plurality of seeds, respectively, and to perform an updating operation associated with each seed to update the relevant state vectors of the state spaces associated with each seed. When the plurality of seeds includes a first seed and a second seed, the memory of the present disclosure is provided with the state space associated with each of the first seed and the second seed. Based on this, in performing a pipelined operation for generating random numbers, the arithmetic unit of the present disclosure may be configured to loop through the following operations according to a random number instruction until a predetermined number of random numbers are generated. Specifically, the arithmetic unit may generate a partial random number using the state space associated with the first seed and update a corresponding number of state vectors. Next, the arithmetic unit may generate a partial random number using the state space associated with the second seed and update a corresponding number of state vectors. By iteratively executing the generating operation and the updating operation of the first seed association and the generating operation and the updating operation of the second seed association repeatedly, the scheme of the disclosure obviously improves the speed and the efficiency of generating the random numbers with expected number, realizes the pipeline execution of hardware, and thus improves the overall performance of the hardware.

FIG. 2 is a schematic diagram illustrating an initialization process for generating random numbers according to one embodiment of the present disclosure. For ease of understanding, the size of the state space provided in the memory of fig. 1 is represented by a straight line, and the state vector in the state space is represented by a dot. As previously described, the state vector is used for the generation of a certain number of random numbers and the updating of the state vector. Further, to enable pipelined execution of hardware, the present disclosure proposes to extend the state space, e.g., extending the original state space supporting N state vectors to a state space supporting 2N state vectors.

As shown in the upper part of fig. 2, during initialization, the arithmetic unit may generate a state space containing N state vectors using a seed according to the decoded random number instruction. For example, a state space of N =351 (i.e., from 0 th to 350 th) state vectors may be generated. When a state vector occupies 4 bytes, then the initial state space of N =351 occupies 1404 bytes. When using, for example, the "Mersene Twister for Graphic Processor Dynamic Creator," MTGPDC "random number generation algorithm, FIG. 2 also shows three state vectors X [ i ], X [ i +1], and X [ i + M ] for generating an update state space, which generate a state vector for updating X [ i + N ] by:

t＝X[i+1]^(x[i]|mask)；

t＝t^(t<<sh1)；

u＝t^(X[i+M]>>sh2)；

X[i+N]＝u^(R_table[u&0xF])。

by repeatedly performing the above operations, for example, the state space of 0-N-1 can be expanded to the state space of 0-2N-1 as shown in the lower part of FIG. 2, thereby completing the initialization process of the present disclosure. In some embodiments, initialization of the state space of 0-N1 may be performed by a general purpose processor ("CPU"), while initialization of the state space of N-2N 1 may be performed by a smart processor. Still taking the example of N =351 above, after 351 state vectors are generated, the expansion of the state vectors from N to 2N may be accomplished using four microinstructions. Specifically, 96 state vectors are updated with a first microinstruction; next, the (N-M-96) state vectors are updated with a second micro instruction; next, min (96, M) state vectors are updated, wherein if M is less than or equal to 96 states, M state vectors are updated, thereby completing the initialization of N2N-1 state vectors; conversely, when M >96, then there is a fourth microinstruction that updates (M-96) state vectors, thereby completing the initialization of N2N-1 state vectors.

It is to be understood that the initialization process described above applies to the initialization process in a single seed mode or multiple seed modes of the present disclosure. In particular, for a pattern of multiple seeds, the disclosed solution proposes initializing N states for each seed of the multiple seeds and then expanding the N states to 2N states, for example, in the manner described above, thereby expanding the state space containing the N state vectors to a state space containing the 2N state vectors. In one implementation scenario, the initialization process described above is applicable to a scheme of a processor core (referred to as a "corelet") that has relatively little processing or computational power. The initialization process under a processor core (simply "big core") that has a relatively large processing or computing capability (i.e., performs relatively many computing operations per clock cycle) is described below in conjunction with fig. 3.

Fig. 3 is a schematic diagram illustrating an initialization process for generating random numbers, i.e., the initialization process of the above-mentioned "big core" scheme, according to yet another embodiment of the present disclosure. It should be noted that the initialization process for the 0-N-1 state vectors is similar to the "corelet" approach described above, which may be generated by an off-chip system (e.g., CPU), for example. Next, starting from a state having 0 to N-1 state vectors as shown in the upper part of fig. 3, (N-M) state vectors are first generated using an update operation in a random number algorithm (as in the previous "MTGPDC"), thereby updating the (N-M) state vectors starting from N, as shown by the arrowed line segments from "N" to "2N-M-1" in the middle part of fig. 3. Thereafter, the M states are then updated, as indicated by the arrowed line segments from "2N-M-1" to "2N-1" at the bottom of FIG. 3. In one implementation scenario, the aforementioned initialization of the state vectors from "N" to "2N-M-1" and from "2N-M-1" to "2N-1" may be implemented by two microinstructions to eventually expand the state space containing N state vectors to a state space having 2N state vectors to support subsequent pipelined operations that generate random numbers.

FIG. 4 is a diagram illustrating state space variations for generating random numbers according to one embodiment of the present disclosure. To facilitate understanding of the generation and update operations of the present disclosure, the (1) th through (7) th states of the state space are shown, wherein the position (or spatial address) of the state vectors X [ i ], X [ i +1], and X [ i + M ] used to update the state vectors and the state vectors Y [ i + M-1], and Y [ i + N ] used to generate the random numbers in the state space are identified as varying. Specifically, as previously described, the random number O [ i ] may be generated by using the state vectors Y [ i + (m-1) ] and Y (i + N) shown in the figure based on the aforementioned "MTGPDC" random number generation algorithm by:

t＝Y[i+(M-1)]^(Y[i+(M-1)])；

t＝t^(t>>8)；

O[i]＝Y[i+N]^T_table[t&0x4]。

then, the 3 state vectors X [ i ], X [ i +1], and X [ i + M ] can be read from the state space based on the aforementioned "MTGPDC" random number generation algorithm to generate a state vector Y (i + N) for updating, where i, i +1, and i + M are the distances from the state vector to the first address of the state space (i.e., the address pointer of each state vector). It can be seen that in order to implement pipelined generation of random numbers, the present disclosure proposes to perform the generation operation of generating random numbers first, and then perform the update operation of updating the state vector. In one implementation scenario, considering the processing power of the arithmetic unit and the memory access performance to data, for a predetermined (or target) number of random numbers, the present disclosure proposes to alternately perform a generation operation and an update operation based on a plurality of microinstructions to generate a predetermined number of random numbers.

Referring specifically to FIG. 4 in conjunction with the above, in state (1) of the state space, the state space has completed the initialization process as described above, i.e., expanding from 0 to N-1 state vectors to generate N-2N-1 state vectors, thereby obtaining a state space with a size of 0 to 2N-1. Next, in the (2) th state of the state space, the arithmetic unit may generate 96 random numbers using the state vectors Y [ i + M-1] and Y [ i + N ]. For this purpose, the state vectors Y [ i + M-1] and Y [ i + N ] in the (1) th state are continuously shifted to the right in the process of generating random numbers until the state vector Y [ i + N ] reaches the spatial position of "N +96" in the state space. After generating 96 random numbers, the arithmetic unit may then generate 96 new state vectors using the state vectors X [ i ], X [ i +1], and X [ i + M ] to update the state space, as shown by the segments of 0 to 96 indicated by the arrows in the (3) th state of the state space in the figure.

As previously described, the state space of the present disclosure may be divided into a plurality of read space segments and write space segments that are distributed in series. In view of this, for the example shown in FIG. 4, the segments through which the state vectors Y [ i + M-1] and Y [ i + N ] move during the state space from the (2) th state to the (3) th state can be considered as the aforementioned read space segments, while the segment shown in the (3) th state that updates 96 states is also the aforementioned write space segment.

Next, in state (4), the arithmetic unit may then read Y [ i + M-1] and Y [ i + N ] sequentially to the right along the spatial state to generate 96 random numbers. Thus, Y [ i + M-1] in state (3) will go from spatial position "96+ M-1" to spatial position "192+ M-1" in state (4). Accordingly, Y [ i + N ] in state (3) will go from spatial position "N +96" to spatial position "N +192" in state (4). Then, similarly to the (3) th state, in the (5) th state, the arithmetic unit may generate 96 new state vectors using X [ i ], X [ i +1], and X [ i + M ] to update the state space, as indicated by the section of "96 to 192" indicated by the arrow in the (5) th state. Similar to the above description, the "96-192" sector is also referred to herein as the write space sector in the context of the present disclosure.

Similarly to the above (4) th state and (5) th state, in the (6) th state and the (7) th state, the arithmetic unit may generate 159 random numbers and update the (N-M-192) state vectors, which are shown by the sections of "192" to (N-M) "indicated by arrows in the (7) th state. Although not further shown in FIG. 4, it is understood that different implementations may be employed herein for updating the (N-M-192) state vectors based on the numerical comparison of (M-1) to 96. Specifically, when (M-1) < =96, then the arithmetic unit may update (M-1) state vectors by one microinstruction (whereby X [ i +1] loops back to the state space "0" position), and then update 1 state with one microinstruction (whereby X [ i ] loops back to the state space "0" position). In contrast, when M-1>96, then the update operation may be completed with three microinstructions, e.g., first updating 96 states, then updating M-1-96 states (whereby X [ i +1] loops back to state space "0" location), and finally updating 1 state (whereby X [ i ] loops back to state space "0" location).

By utilizing the random number scheme of the present disclosure described above in conjunction with fig. 4, the idle latency (e.g., waiting a certain time to read after updating) during existing read and write operations to the memory can be overcome, and thus the time can be utilized to perform the operation of generating the random number. In particular, the present disclosure enables hardware pipelining and efficient generation of random numbers by expanding the state space and dividing it into different memory access segments that overcome address dependencies, and by performing the generation operation first and then the update operation.

Fig. 5a is a schematic diagram illustrating a partial state space variation for generating random numbers according to yet another embodiment of the present disclosure, and fig. 5b is a schematic diagram illustrating another partial state space variation for generating random numbers according to the previous embodiment. In one embodiment, the generate operation and the update operation illustrated in fig. 5a and 5b may be performed by the aforementioned "big core". In contrast, the generate operation and the update operation shown in FIG. 4 may be performed by the aforementioned "corelets".

As shown in fig. 5a, in the (1) th and (2) th states, the arithmetic unit may first generate 192 random numbers with two microinstructions and then update 192 state vectors in the state space, whose positions are shown as the segments indicated by the arrows in the (2) th state. Next, the arithmetic unit may sequentially execute four microinstructions, thereby generating 159 random numbers in state (3) and updating (N-M-192), (M-1), and 1 state vectors in states (4) through (6), respectively.

Next, as shown in fig. 5b, in the (7) th and (8) th states, the arithmetic unit may then generate 192 random numbers, and then update 192 state vectors. As described above, the arithmetic unit may perform the operations described above with two microinstructions. Then, in the (9) th state to the second state

During the state, the arithmetic unit may sequentially complete the operations of generating (160-M) random numbers, generating (M-1) random numbers, and updating 159 state vectors.

Since the address change of the state vector for generating the random number and the state vector for updating the state space in the state space has been described in the foregoing with reference to fig. 4, the description is omitted here for the sake of brevity. In addition, although fig. 4 and 5 describe the generation operation and the update operation of the present disclosure by taking N =351 (i.e., 2n = 702) as an example, those skilled in the art can understand that the value of N here may be changed according to the processing capability of hardware (such as an arithmetic unit and a memory) based on the teaching of the present disclosure. Further, the microinstructions and the number thereof for performing the generating operation and the updating operation may also be adapted accordingly according to the hardware architecture.

Fig. 6 is a flow chart illustrating a method 600 for generating random numbers in accordance with an embodiment of the present disclosure. Based on the foregoing description of the present disclosure in conjunction with fig. 1-5, those skilled in the art can understand that the method shown in fig. 6 can be implemented by the apparatus 100 shown in fig. 1, and therefore the description of the operation of the apparatus 100 also applies to the step operation of the method 600, and the same contents will not be described again.

As shown in fig. 6, at step S602, a random number instruction is received and decoded. Here, the random number instruction includes various types of information regarding generation of a random number, and the decoding may involve parsing of the random number instruction so that a plurality of microinstructions may be obtained. Next, at step S604, a state space may be set according to the decoded random number instruction, wherein the state space is configured to store a state vector for generating a random number and to accept a state vector update to the state space. Here, the state space may have the exemplary form described above in connection with fig. 4, 5a and 5 b. Finally, at step S606, a random number generation operation based on a single seed or a plurality of seeds may be performed in accordance with the decoded random number instruction, wherein for each of the seeds the random number generation operation comprises a generation operation and an update operation performed in a pipelined manner, wherein the generation operation is used to generate random numbers and the update operation is used to update the state vector. In one embodiment, the state space may be sized to support pipelined execution of the generate and update operations under a single or multiple seeds.

FIG. 7 is a flow chart illustrating a method 700 of generating random numbers in a single submode according to an embodiment of the present disclosure. To facilitate understanding and discussion of this flow, the diagram also schematically shows a state space under a single seed, which includes 0-2N-1 state vectors.

As shown in fig. 7, at step S702, N state vectors may first be generated using a seed, thereby initializing N states in a state space. Next, at step S704, the state may be updated using an update operation in a random number generation algorithm, thereby extending the state subspace of 0-N-1 to the state subspace of N-2N-1, thereby resulting in a state space having 2N state vectors. Next, the aforementioned generation operation and the update operation may be performed at steps S706 and S708 to generate random numbers and update the state space. When the predetermined number of random numbers have not been generated, steps S710 and S712 may be performed next to generate a predetermined number of random numbers next. Similarly, the generation operation and the update operation may also be repeatedly performed until a predetermined number of random numbers are generated.

FIG. 8 is a flow chart illustrating a method 800 of generating random numbers in a dual-seed mode according to an embodiment of the present disclosure. Based on the foregoing description, it can be understood that the method 800 shown in fig. 8 can be performed by the apparatus 100 shown in fig. 1, and thus the foregoing description of the apparatus 100 and the generating operation and the updating operation thereof also applies to the following description of the method 800. In addition, for convenience of description, state spaces associated with the first seed and the second seed, respectively, are also shown in fig. 8, and each state space has 2N state vectors accommodating 0 to 2N-1.

As shown in fig. 8, at steps S802 and S804, N state vectors in the first seed state space may be initialized, and then only an update operation is performed to expand the N state vectors to 2N state vectors. Similarly, at steps S806 and S808, an initialization operation with respect to the state space of the second seed is also performed to obtain a state vector containing 2N initial states.

Next, at steps S810 and S812, a generation operation of generating random numbers with respect to the first seed and an update operation for updating the state space are performed, respectively. After the first seed is executed, an operation of generating a random number and an operation of updating a state space with respect to the second seed are then performed at steps S814 and S816. Assuming that the predetermined number of random numbers associated with the first seed and the second seed has not been reached at this time, a fractional number of random numbers may then be generated and the same fractional number of state vectors updated accordingly at steps S818 and S820. Likewise, at steps S822 and S824, an operation of generating a random number with respect to the second seed and updating the state space are then performed. When the predetermined number of random numbers has not been reached yet, the operations of generating random numbers and updating the state space with respect to the first seed and the second seed will still be alternately performed until the predetermined number of random numbers is generated.

In one implementation scenario, when the operation of generating random numbers is performed simultaneously for a first seed and a second seed, the storage address of the random number associated with the first seed may be set in the random number instruction of the present disclosure. In this way, the arithmetic unit can place the random number associated with the second seed after the random number associated with the first seed according to the aforementioned memory address when generating the random number associated with the second seed. In particular, since the arithmetic unit generates random numbers of the first seed and the second seed in a ping-pong pipeline manner, a head address for storing the random numbers generated by the second seed can be calculated in advance based on the storage address of the first seed random number and the generated number.

In one implementation scenario, for a seed that will not be used any longer in the following, a setting item for enabling or disabling the seed may also be included in the random number instruction of the present disclosure. Based on this, when enabled (e.g., setting the enabled flag bit to 1), the state space of the seed may be initialized (e.g., expanded from N states to 2N states); in contrast, when disabled (e.g., the disable flag bit is set to 1), the settings and state space associated with the seed are deleted after a predetermined number of random numbers are generated using the seed. In another implementation scenario, the number of skipped states may also be set in a random number instruction. Based on this, when performing operations to generate random numbers based on the first seed and/or the second seed, some state vectors in the state space may be skipped or discarded, e.g., state vectors for updates are generated, but not written to the state space until a state vector for an expected location is obtained, and the generating and updating operations are performed starting at that location.

Fig. 9 is a block diagram illustrating a combined processing device 900 according to an embodiment of the present disclosure. As shown in fig. 9, the combined processing device 900 includes a computing processing device 902, an interface device 904, other processing devices 906, and a storage device 908. Depending on the application scenario, one or more computing devices 910 may be included in the computing processing device, and may be configured to perform the generating operation for generating random numbers and the updating operation for updating the state space described herein in conjunction with fig. 1-8.

In various embodiments, the computing processing device of the present disclosure may be configured to perform user-specified operations. In an exemplary application, the computing processing device may be implemented as a single-core artificial intelligence processor or a multi-core artificial intelligence processor. Similarly, one or more computing devices included within a computing processing device may be implemented as an artificial intelligence processor core or as part of a hardware structure of an artificial intelligence processor core. When multiple computing devices are implemented as artificial intelligence processor cores or as part of a hardware structure of an artificial intelligence processor core, computing processing devices of the present disclosure may be considered to have a single core structure or a homogeneous multi-core structure.

In an exemplary operation, the computing processing device of the present disclosure may interact with other processing devices through an interface device to collectively perform user-specified operations. Other Processing devices of the present disclosure may include one or more types of general and/or special purpose processors, such as Central Processing Units (CPUs), graphics Processing Units (GPUs), and artificial intelligence processors, depending on the implementation. These processors may include, but are not limited to, digital Signal Processors (DSPs), application Specific Integrated Circuits (ASICs), field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic, discrete hardware components, etc., and the number may be determined based on actual needs. As previously mentioned, the computing processing device of the present disclosure may be considered to have a single core structure or an isomorphic multi-core structure only. However, when considered together, a computing processing device and other processing devices may be considered to form a heterogeneous multi-core structure.

In one or more embodiments, the other processing devices can interface the computing processing device of the present disclosure with external data and controls, performing basic controls including, but not limited to, data handling, starting and/or stopping of the computing device, and the like. In other embodiments, other processing devices may cooperate with the computing processing device to collectively perform computational tasks.

In one or more embodiments, the interface device may be used to transfer data and control instructions between the computing processing device and other processing devices. For example, the computing processing device may obtain input data from other processing devices via the interface device, and write the input data into a storage device (or memory) on the computing processing device. Further, the computing processing device may obtain the control instruction from the other processing device via the interface device, and write the control instruction into the control cache on the computing processing device slice. Alternatively or optionally, the interface device may also read data from the memory device of the computing processing device and transmit the data to the other processing device.

Additionally or alternatively, the combined processing device of the present disclosure may further include a storage device. As shown in the figure, the storage means is connected to the computing processing means and the further processing means, respectively. In one or more embodiments, the storage device may be used to hold data for the computing processing device and/or the other processing devices. For example, the data may be data that is not fully retained within internal or on-chip storage of a computing processing device or other processing device.

In some embodiments, the present disclosure also discloses a chip (e.g., chip 1002 shown in fig. 10). In one implementation, the Chip is a System on Chip (SoC) and is integrated with one or more combinatorial processing devices as shown in fig. 9. The chip may be connected to other associated components through an external interface device (such as external interface device 1006 shown in fig. 10). The relevant component may be, for example, a camera, a display, a mouse, a keyboard, a network card, or a wifi interface. In some application scenarios, other processing units (e.g., video codecs) and/or interface modules (e.g., DRAM interfaces) and/or the like may be integrated on the chip. In some embodiments, the disclosure also discloses a chip packaging structure, which includes the chip. In some embodiments, the disclosure further discloses a board or an electronic device, which includes the chip packaging structure. The board will be described in detail with reference to fig. 10.

Fig. 10 is a schematic diagram illustrating a structure of a board 1000 according to an embodiment of the disclosure. As shown in fig. 8, the board includes a memory device 1004 for storing data, which includes one or more memory cells 1010. The memory device may be connected and data transferred to the control device 1008 and the chip 1002 described above by way of, for example, a bus. Further, the board card further includes an external interface device 1006 configured to perform a data relay or transfer function between the chip (or the chip in the chip package) and an external device 1012 (such as a server or a computer). For example, the data to be processed may be transferred to the chip by an external device through an external interface means. For another example, the calculation result of the chip may be transmitted back to an external device via the external interface device. According to different application scenarios, the external interface device may have different interface forms, for example, it may adopt a standard PCIE interface or the like.

In one or more embodiments, the control device in the disclosed cards may be configured to regulate the state of the chip. Therefore, in an application scenario, the control device may include a single chip Microcomputer (MCU) for controlling the operating state of the chip.

From the above description in conjunction with fig. 9 and 10, it will be understood by those skilled in the art that the present disclosure also discloses an electronic device or apparatus, which may include one or more of the above boards, one or more of the above chips and/or one or more of the above combination processing devices.

According to different application scenarios, the electronic device or apparatus of the present disclosure may include a server, a cloud server, a server cluster, a data processing apparatus, a robot, a computer, a printer, a scanner, a tablet computer, a smart terminal, a PC device, a terminal of the internet of things, a mobile terminal, a mobile phone, a vehicle recorder, a navigator, a sensor, a camera, a video camera, a projector, a watch, an earphone, a mobile storage, a wearable device, a visual terminal, an autopilot terminal, a vehicle, a household appliance, and/or a medical device. The vehicle comprises an airplane, a ship and/or a vehicle; the household appliances comprise a television, an air conditioner, a microwave oven, a refrigerator, an electric cooker, a humidifier, a washing machine, an electric lamp, a gas stove and a range hood; the medical equipment comprises a nuclear magnetic resonance apparatus, a B-ultrasonic apparatus and/or an electrocardiograph. The electronic device or apparatus of the present disclosure may also be applied to the fields of the internet, the internet of things, data centers, energy, transportation, public management, manufacturing, education, power grid, telecommunications, finance, retail, construction site, medical, and the like. Further, the electronic device or apparatus disclosed herein may also be used in application scenarios related to artificial intelligence, big data, and/or cloud computing, such as a cloud, an edge, and a terminal. In one or more embodiments, the computationally-powerful electronic device or apparatus according to the present disclosure may be applied to a cloud device (e.g., a cloud server), while the less-power electronic device or apparatus may be applied to a terminal device and/or an edge device (e.g., a smartphone or a camera). In one or more embodiments, the hardware information of the cloud device and the hardware information of the terminal device and/or the edge device are compatible with each other, so that appropriate hardware resources can be matched from the hardware resources of the cloud device according to the hardware information of the terminal device and/or the edge device to simulate the hardware resources of the terminal device and/or the edge device, so as to complete unified management, scheduling and cooperative work of end-cloud integration or cloud-edge-end integration.

It is noted that for the sake of brevity, this disclosure presents some methods and embodiments thereof as a series of acts or combinations thereof, but those skilled in the art will appreciate that the disclosed aspects are not limited by the order of acts described. Accordingly, one of ordinary skill in the art will appreciate that certain steps may be performed in other sequences or simultaneously, in accordance with the disclosure or teachings of the present disclosure. Further, those skilled in the art will appreciate that the embodiments described in this disclosure are capable of alternative embodiments, in that the acts or modules involved are not necessarily required for the implementation of the solution or solutions of the disclosure. In addition, the present disclosure also focuses on the description of some embodiments, depending on the solution. In view of the above, those skilled in the art will understand that portions of the disclosure that are not described in detail in one embodiment may also be referred to in the description of other embodiments.

In particular implementation, based on the disclosure and teachings of the present disclosure, one skilled in the art will appreciate that the several embodiments disclosed in the present disclosure may be implemented in other ways not disclosed herein. For example, as for each unit in the foregoing embodiments of the electronic device or apparatus, the units are divided based on the logic function, and there may be another division manner in the actual implementation. Also for example, multiple units or components may be combined or integrated with another system or some features or functions in a unit or component may be selectively disabled. The connections discussed above in connection with the figures may be direct or indirect couplings between the units or components in terms of connectivity between the different units or components. In some scenarios, the foregoing direct or indirect coupling involves a communication connection utilizing an interface, where the communication interface may support electrical, optical, acoustic, magnetic, or other forms of signal transmission.

In the present disclosure, units described as separate parts may or may not be physically separate, and parts shown as units may or may not be physical units. The aforementioned components or units may be co-located or distributed over multiple network elements. In addition, according to actual needs, some or all of the units can be selected to achieve the purpose of the solution described in the embodiments of the present disclosure. In addition, in some scenarios, multiple units in embodiments of the present disclosure may be integrated into one unit or each unit may exist physically separately.

In some implementation scenarios, the integrated units may be implemented in the form of software program modules. If implemented in the form of software program modules and sold or used as a stand-alone product, the integrated units may be stored in a computer readable memory. In this regard, when aspects of the present disclosure are embodied in the form of a software product (e.g., a computer-readable storage medium), the software product may be stored in a memory, which may include instructions for causing a computer device (e.g., a personal computer, a server, or a network device, etc.) to perform some or all of the steps of the methods described in embodiments of the present disclosure. The Memory may include, but is not limited to, a usb disk, a flash disk, a Read Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk.

In other implementation scenarios, the integrated unit may also be implemented in hardware, that is, a specific hardware circuit, which may include a digital circuit and/or an analog circuit, etc. The physical implementation of the hardware structure of the circuit may include, but is not limited to, physical devices, which may include, but are not limited to, transistors or memristors, among other devices. In view of this, the various devices described herein (e.g., computing devices or other processing devices) may be implemented by suitable hardware processors, such as CPUs, GPUs, FPGAs, DSPs, ASICs, and the like. Further, the aforementioned storage unit or storage device may be any suitable storage medium (including magnetic storage medium or magneto-optical storage medium, etc.), and may be, for example, a variable Resistive Memory (RRAM), a Dynamic Random Access Memory (DRAM), a Static Random Access Memory (SRAM), an Enhanced Dynamic Random Access Memory (EDRAM), a High Bandwidth Memory (HBM), a Hybrid Memory Cube (HMC), a ROM, a RAM, or the like.

The foregoing may be better understood in light of the following clauses:

clause A1, an apparatus for generating a random number, comprising:

an instruction decode unit configured to receive a random number instruction and decode the random number instruction;

an arithmetic unit configured to perform a random number generation operation based on a single seed or a plurality of seeds in accordance with the decoded random number instruction, wherein for each of the seeds the random number generation operation comprises a generation operation and an update operation performed in a pipelined manner, wherein the generation operation is used to generate random numbers and the update operation is used to update state vectors; and

a memory configured to set a state space according to the decoded random number instruction, wherein the state space is configured to store a state vector for generating the random number and to accept a state vector update to the state space from an arithmetic unit,

wherein the state space is sized to:

supporting, under a single seed, execution of the generating operation and the updating operation in a pipelined manner; or alternatively

Under a plurality of seeds, the random number generation operation of each seed is supported to be executed in a pipeline mode.

Clause A2, the apparatus of clause A1, wherein the nonce instruction comprises one or more of: the number of generated random numbers, the size and address of the state space, one or more seed information for generating random numbers, a concatenation parameter for concatenating random numbers, and an address of an output random number.

Clause A3, the apparatus of clause A1, wherein the random number instruction is a single instruction and is parsed by the instruction decode unit to obtain a plurality of microinstructions for performing pipelined operations, wherein the arithmetic unit is configured to:

according to the microinstruction, a generation operation of generating a random number and an update operation of updating the state space are performed along the state space in a cyclic shift manner.

Clause A4, the apparatus of clause A3, wherein the microinstructions include a generate microinstruction for the generate operation and an update microinstruction for the update operation, and the generate and update microinstructions are related to a state space location to which the generate and update operations relate, and the arithmetic unit is configured to generate a random number for the portion number by one of:

generating a partial number of random numbers according to a generate microinstruction and updating the partial number of state vectors according to an update microinstruction; or alternatively

Generating a partial number of random numbers according to one generation microinstruction, and updating the partial number of state vectors according to a plurality of update microinstructions, respectively; or

A partial number of random numbers are generated from each of the plurality of generation microinstructions, and a partial number of random numbers are generated from one of the update microinstructions.

Clause A5, the apparatus of clause A3, wherein each of the microinstructions includes an address range entry indicating a read operation or a write operation for the state space, and there are two consecutive microinstructions with address overlaps for the address range entries, the arithmetic unit configured to:

and only after the read operation or the write operation of one of the micro instructions is executed, the read operation or the write operation of the next micro instruction is executed.

Clause A6, the apparatus of clause A1, wherein the size of the state space is configured by software such that the generating and updating operations under a single seed are performed in a pipelined manner, and such that the random number generating operations under multiple seeds are performed in a pipelined manner.

Clause A7, the apparatus of clause A1, wherein the state space is further configured to store a state pointer from the instruction decode unit, wherein the state pointer indicates a location in the state space of a first state vector for a generate operation and a location in the state space of a first update state vector for the update operation.

Clause A8, the apparatus according to clause A1, wherein the status space includes a plurality of space segments, and in the pipelined operation, the plurality of space segments variably operate on a read space segment or a write space segment according to a generation operation and an update operation to be performed, wherein the read space segment and the write space segment have a predetermined address interval in the status space so as to support the generation operation and the update operation to be performed in a pipelined manner.

Clause A9, the apparatus according to clause A1, wherein the state space comprises N consecutively distributed read space sections and N consecutively distributed write space sections, and the N read space sections and the N write space sections form a correspondence in a pipelined operation, wherein N is a positive integer greater than or equal to 2, and during the pipelined operation the arithmetic unit is configured to perform the following operations in cycles according to the number of pre-generated random numbers, wherein i =1, \8230n:

in the generating operation, reading a plurality of state vectors from an ith read space segment to generate a corresponding ith plurality of random numbers; and

in the update operation, a plurality of state update vectors for updating a state space are generated from the plurality of state vectors in the ith read space segment, and the plurality of state update vectors are written to the ith write space segment.

Clause a10, the apparatus according to clause A1, wherein during initialization of the state space, the state space is divided into a first state subspace and a second state subspace of equal size, the arithmetic unit being configured to:

performing an update operation for the second state subspace with the state vectors stored in the first state subspace to generate all state vectors for the second state subspace.

Clause a11, the apparatus of clause a10, wherein in generating all state vectors of the second state subspace, the arithmetic unit is further configured to:

the plurality of microinstructions are sequentially executed to generate all state vectors for updating the second state subspace, wherein execution of each microinstruction generates a corresponding number of state vectors.

Clause a12, the apparatus of any one of clauses A1-a11, wherein:

under a single seed, the arithmetic unit is configured to perform the generating operation to generate the random number using a state vector generated by the single seed, and perform the updating operation to update a relevant state vector of a state space; or

Under a plurality of seeds, each seed has its associated state space, and the arithmetic unit is configured to perform a generating operation associated with a respective seed of the plurality of seeds to generate the random number associated with the respective seed, and to perform an updating operation associated with the respective seed to update the relevant state vector of the state space associated with the respective seed, respectively, using the state vector generated by the respective seed.

Clause a13, the apparatus of clause a12, wherein, under a single seed, the arithmetic unit is configured to:

alternately performing a generating operation of generating the random number and an updating operation for updating the state space in succession according to a plurality of microinstructions until a predetermined number of random numbers are generated,

wherein each time a partial number of random numbers is generated by an operation and each time an update operation updates a corresponding number of state vectors in the state space.

Clause a14, the apparatus according to clause a12, wherein the plurality of seeds comprises a first seed and a second seed, wherein the memory is provided with the state space associated with each of the first seed and the second seed, and in performing the pipelined operation for generating random numbers, the arithmetic unit is configured to perform the following operations in a loop according to the random number instruction until a predetermined number of random numbers are generated:

generating a partial random number using the state space associated with the first seed and updating a corresponding number of state vectors; and

generating a partial random number using the state space associated with the second seed and updating a corresponding number of state vectors.

Clause a15, an integrated circuit chip comprising the apparatus of any one of clauses A1-a 14.

Clause a16, a board comprising the integrated circuit chip of clause a 15.

Clause a17, an electronic device, comprising the integrated circuit chip of clause a 16.

Clause a18, a method for generating a random number, comprising:

receiving a random number instruction and decoding the random number instruction;

setting a state space according to the decoded random number instruction, wherein the state space is configured to store a state vector used to generate the random number and to accept state vector updates to the state space; and

performing a random number generation operation based on a single seed or a plurality of seeds in accordance with the decoded random number instruction, wherein for each of the seeds the random number generation operation comprises a generation operation and an update operation that are performed in a pipelined manner, wherein the generation operation is used to generate random numbers and the update operation is used to update state vectors,

wherein the state space is sized to:

supporting, under a single seed, execution of the generating operation and the updating operation in a pipelined manner; or

Clause a19, the method of clause a18, wherein the nonce instruction comprises one or more of: the number of generated random numbers, the size and address of the state space, one or more seed information for generating random numbers, a concatenation parameter for concatenating random numbers, and an address of an output random number.

Clause a20, the method of clause a18, wherein the random number instruction is a single instruction and parsed to obtain a plurality of microinstructions for performing pipelined operations, the method further comprising:

Clause a21, the method of clause a20, wherein the microinstructions include a generate microinstruction for the generate operation and an update microinstruction for the update operation, and the generate and update microinstructions are related to a state space location to which the generate and update operations relate, and the method comprises generating a portion number of random numbers and an update portion number of state vectors in one of:

generating a partial number of random numbers according to a generation microinstruction and updating the partial number of state vectors according to an update microinstruction; or alternatively

Generating a partial number of random numbers according to one generation microinstruction, and updating the partial number of state vectors respectively according to a plurality of update microinstructions; or

A partial number of random numbers are generated in accordance with a plurality of generation microinstructions, respectively, and a partial number of state vectors are updated in accordance with an update microinstruction.

Clause a22, the method of clause a20, wherein each of the microinstructions includes an address range entry indicating a read operation or a write operation for the state space, and there are two consecutive microinstructions having address overlaps for the address range entry, the method further comprising:

Clause a23, the method of clause a18, wherein the size of the state space is configured by software so as to cause the generating and updating operations under a single seed to be performed in a pipelined manner, and so as to cause the random number generating operations under multiple seeds to be performed in a pipelined manner.

Clause a24, the method of clause a18, wherein the state space is further configured to store a state pointer, wherein the state pointer indicates a location in the state space of a first state vector for the generate operation and a location in the state space of a first update state vector for the update operation.

Clause a25, the method according to clause a18, wherein the status space includes a plurality of space segments, and in the pipelining, the plurality of space segments variably operate on a read space segment or a write space segment according to a generating operation and an updating operation to be performed, wherein the read space segment and the write space segment have a predetermined address interval in the status space so as to support the generating operation and the updating operation to be performed in a pipelined manner.

Clause a26, the method of clause a18, wherein the status space comprises N contiguously distributed read space segments and N contiguously distributed write space segments, and the N read space segments and the N write space segments form a correspondence in a pipelined operation, wherein N is a positive integer greater than or equal to 2, and during the pipelined operation the method performs the following operations in a loop according to the number of pre-generated random numbers, wherein i =1, \8230n:

Clause a27, the method of clause a18, wherein during initialization of the state space, the state space is divided into a first state subspace and a second state subspace of equal size, the method further comprising:

performing an update operation for the second state subspace using the state vectors stored in the first state subspace to generate all state vectors for the second state subspace.

Clause a28, the method of clause a27, wherein in generating all state vectors of the second state subspace, the method further comprises:

Clause a29, the method according to any one of clauses a18-a28, wherein:

under a single seed, performing the generating operation to generate the random number using a state vector generated by the single seed, and performing the updating operation to update a relevant state vector of a state space; or alternatively

Under a plurality of seeds, each seed has its associated state space, and the method performs a generating operation associated with a respective seed to generate the random number associated with the respective seed using a state vector generated by the respective seed of the plurality of seeds, respectively, and performs an updating operation associated with the respective seed to update a relevant state vector of the state space associated with the respective seed.

Clause a30, the method of clause a29, wherein under a single seed, the method further comprises:

alternately performing a generating operation of generating the random number and an updating operation for updating the state space continuously according to a plurality of microinstructions until a predetermined number of random numbers are generated,

Clause a31, the method of clause a29, wherein the plurality of seeds comprises a first seed and a second seed, wherein the method comprises providing the state space associated with each of the first seed and the second seed, and in performing the pipelined operation for generating random numbers, the method comprises performing the following in a loop according to the random number instruction until a predetermined number of random numbers are generated:

It should be understood that the terms "first," "second," "third," and "fourth," etc. in the claims, description, and drawings of the present disclosure are used to distinguish between different objects and are not used to describe a particular order. The terms "comprises" and "comprising," when used in the specification and claims of this disclosure, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the disclosure herein is for the purpose of describing particular embodiments only, and is not intended to be limiting of the disclosure. As used in the specification and claims of this disclosure, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should be further understood that the term "and/or" as used in the specification and claims of this disclosure refers to any and all possible combinations of one or more of the associated listed items and includes such combinations.

As used in this specification and claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to a determination" or "in response to a detection". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".

While various embodiments of the present disclosure have been shown and described herein, it will be obvious to those skilled in the art that such embodiments are provided by way of example only. Numerous modifications, changes, and substitutions will occur to those skilled in the art without departing from the spirit and scope of the present disclosure. It should be understood that various alternatives to the embodiments of the disclosure described herein may be employed in practicing the disclosure. It is intended that the following claims define the scope of the disclosure and that equivalents or alternatives within the scope of these claims be covered thereby.

Claims

1. An apparatus for generating random numbers, comprising:

a memory configured to set a state space according to the decoded random number instruction, wherein the state space is configured to store a state vector used to generate the random number and to accept a state vector update to the state space from an arithmetic unit,

wherein the state space is sized to:

2. The apparatus of claim 1, wherein the nonce instruction comprises one or more of: the number of generated random numbers, the size and address of the state space, one or more seed information for generating random numbers, a concatenation parameter for concatenating random numbers, and an address of an output random number.

3. The apparatus of claim 1, wherein the random number instruction is a single instruction and is parsed by the instruction decode unit into a plurality of microinstructions for performing pipelined operations, wherein the arithmetic unit is configured to:

4. The apparatus of claim 3, wherein the microinstructions include a generate microinstruction for the generate operation and an update microinstruction for the update operation, and the generate and update microinstructions are related to state space locations involved in the generate and update operations, and the arithmetic unit is configured to generate a portion number of random numbers and an update portion number of state vectors by one of:

generating a fractional number of random numbers from a single generation microinstruction and updating a fractional number of state vectors from an update microinstruction; or alternatively

Generating a partial number of random numbers from one generation microinstruction and updating partial number of state vectors respectively from a plurality of update microinstructions; or alternatively

5. The apparatus of claim 3, wherein each of the microinstructions comprises an address range entry indicating a read operation or a write operation for a state space, and there are two consecutive microinstructions with address overlap for the address range entry, the operation unit configured to:

and only after the read operation or the write operation of one micro-instruction is executed, the read operation or the write operation of the next micro-instruction is executed.

6. The apparatus of claim 1, wherein a size of the state space is configured by software to cause the generating and updating operations under a single seed to be performed in a pipelined manner and to cause random number generation operations under multiple seeds to be performed in a pipelined manner.

7. The apparatus of claim 1, wherein the state space is further configured to store a state pointer from the instruction decode unit, wherein the state pointer indicates a location in the state space of a first state vector for a generate operation and a location in the state space of a first update state vector for an update operation.

8. The apparatus of claim 1, wherein the status space includes a plurality of space sections, and in the pipelining, the plurality of space sections variably operate on a read space section or a write space section according to a generating operation and an updating operation to be performed, wherein the read space section and the write space section have a predetermined address interval in the status space so as to support the generating operation and the updating operation to be performed in a pipelined manner.

9. The apparatus of claim 1, wherein the state space comprises N contiguously distributed read space segments and N contiguously distributed write space segments, and the N read space segments and the N write space segments form a correspondence in a pipelined operation, wherein N is a positive integer greater than or equal to 2, and during the pipelined operation the arithmetic unit is configured to perform the following operations in a loop according to a number of pre-generated random numbers, wherein i =1, \8230n:

10. The apparatus of claim 1, wherein during initialization of the state space, the state space is partitioned into a first state subspace and a second state subspace of equal size, the arithmetic unit being configured to:

11. The apparatus of claim 10, wherein in generating all state vectors of the second state subspace, the arithmetic unit is further configured to:

12. The apparatus of any one of claims 1-11, wherein:

13. The apparatus of claim 12, wherein under a single seed, the arithmetic unit is configured to:

14. The apparatus of claim 12, wherein the plurality of seeds comprises a first seed and a second seed, wherein the memory is provided with the state space associated with each of the first seed and the second seed, and in performing the pipelined operation for generating random numbers, the arithmetic unit is configured to loop through the random number instruction until a predetermined number of random numbers are generated:

15. An integrated circuit chip comprising the apparatus of any of claims 1-14.

16. A board card comprising the integrated circuit chip of claim 15.

17. An electronic device comprising the integrated circuit chip of claim 16.

18. A method for generating random numbers, comprising:

performing a random number generation operation based on a single seed or a plurality of seeds in accordance with the decoded random number instruction, wherein for each of the seeds the random number generation operation comprises a generation operation and an update operation performed in a pipelined manner, wherein the generation operation is used to generate random numbers and the update operation is used to update state vectors,

wherein the state space is sized to:

19. The method of claim 18, wherein the nonce instruction comprises one or more of: the number of generated random numbers, the size and address of the state space, one or more seed information for generating the random numbers, a concatenation parameter for concatenating the random numbers, and an address of the output random number.

20. The method of claim 18, wherein the nonce instruction is a single instruction and is parsed to result in a plurality of microinstructions for performing pipelined operations, the method further comprising:

21. The method of claim 20, wherein the microinstructions include a generate microinstruction for the generate operation and an update microinstruction for the update operation, and the generate and update microinstructions are related to a state space location involved with the generate and update operations, and the method comprises generating a portion number of random numbers and an update portion number of state vectors in one of:

generating a partial number of random numbers according to a generation microinstruction and updating the partial number of state vectors according to an update microinstruction; or

Generating a partial number of random numbers from one generation microinstruction and updating a partial number of state vectors from a plurality of update microinstructions; or

22. The method of claim 20, wherein each of the microinstructions includes an address range entry indicating a read operation or a write operation to a state space, and there are two consecutive microinstructions with address overlap for the address range entry, the method further comprising:

23. The method of claim 18, wherein the size of the state space is configured by software such that the generating and updating operations under a single seed are performed in a pipelined manner, and such that the random number generation operations under multiple seeds are performed in a pipelined manner.

24. The method of claim 18, wherein the state space is further configured to store a state pointer, wherein the state pointer indicates a location in the state space of a first state vector for a generate operation and a location in the state space of a first update state vector for an update operation.

25. The method of claim 18, wherein the status space includes a plurality of space segments, and in the pipelined operation, the plurality of space segments are variably operated on a read space segment or a write space segment according to a generation operation and an update operation to be performed, wherein the read space segment and the write space segment have a predetermined address interval in the status space so as to support the generation operation and the update operation to be performed in a pipelined manner.

26. The method of claim 18, wherein the state space includes N contiguously distributed read space segments and N contiguously distributed write space segments, and the N read space segments and the N write space segments form a correspondence in a pipelined operation, wherein N is a positive integer greater than or equal to 2, and during the pipelined operation the method performs the following operations in a loop according to a number of pre-generated random numbers, wherein i =1, \8230n:

27. The method of claim 18, wherein during initialization of the state space, the state space is partitioned into a first state subspace and a second state subspace of equal size, the method further comprising:

28. The method of claim 27, wherein in generating all state vectors of the second state subspace, the method further comprises:

29. The method of any one of claims 18-28, wherein:

under a single seed, performing the generating operation to generate the random number using a state vector generated by the single seed, and performing the updating operation to update a relevant state vector of a state space; or

Under a plurality of seeds, each seed has its associated state space, and the method performs a generating operation associated with a respective seed of the plurality of seeds to generate the random number associated with the respective seed using a state vector generated by the respective seed, and performs an updating operation associated with the respective seed to update the associated state vector of the state space associated with the respective seed.

30. The method of claim 29, wherein under a single seed, the method further comprises:

31. The method of claim 29, wherein the plurality of seeds comprises a first seed and a second seed, wherein the method comprises providing the state space associated with each of the first seed and the second seed, and in performing the pipelined operation for generating random numbers, the method comprises performing the following in a loop according to the random number instruction until a predetermined number of random numbers are generated: