CN115437603B

CN115437603B - Method for generating random numbers and related products

Info

Publication number: CN115437603B
Application number: CN202110622813.4A
Authority: CN
Inventors: 请求不公布姓名
Original assignee: Cambricon Technologies Corp Ltd
Current assignee: Cambricon Technologies Corp Ltd
Priority date: 2021-06-04
Filing date: 2021-06-04
Publication date: 2023-12-19
Anticipated expiration: 2041-06-04
Also published as: WO2022253287A1; CN115437603A

Abstract

The present disclosure discloses an apparatus for generating a random number, an integrated circuit chip, a board card, an electronic device, and a method for generating a random number. Wherein the aforementioned means may be comprised in a combined processing means, which combined processing means may also comprise interface means and other processing means. The computing device interacts with other processing devices to jointly complete the computing operation designated by the user. The combined processing means may further comprise storage means connected to the device and the other processing means, respectively, for storing data of the device and the other processing means. The scheme disclosed by the invention can promote the generation efficiency of the random number and increase the flow performance of hardware execution.

Description

Method for generating random numbers and related products

Technical Field

The present disclosure relates generally to the field of random numbers. More particularly, the present disclosure relates to an apparatus, an integrated circuit chip, a board card, an electronic device, and a method for generating random numbers.

Background

Random numbers have a broad application base in a number of scenarios like statistical applications, experimental tests, etc. As the amount of data counted or tested increases, higher demands are also placed on the size of the amount of data to generate random numbers and the efficiency of generation. In general, a large number of data iterative operations are involved in the algorithm for generating the random number, which requires that a hardware architecture for generating the random number can adapt to such operation requirements. However, in the existing random number generation manner, the hardware architecture does not support pipelining and thus results in relatively low efficiency of random number generation.

Disclosure of Invention

In order to solve at least the problems with the prior art described above, the present disclosure provides a scheme for generating random numbers in a pipelined manner. The scheme of the present disclosure may obtain technical advantages in various aspects including enhancing processing performance of hardware, reducing power consumption, improving execution efficiency of computing operations, and avoiding computing overhead.

In a first aspect, the present disclosure provides an apparatus for generating random numbers, comprising: an instruction decoding unit configured to receive a random number instruction and decode the random number instruction; an arithmetic unit configured to perform a random number generation operation based on a single seed or a plurality of seeds according to the decoded random number instruction, wherein the random number generation operation includes a generation operation and an update operation performed in a pipelined manner for each of the seeds, wherein the generation operation is for generating a random number and the update operation is for updating a state vector; and a memory configured to set a state space according to the decoded random number instruction, wherein the state space is configured to store a state vector for generating the random number and to accept a state vector update of the state space from an arithmetic unit, wherein a size of the state space is set to: under a single seed, supporting the execution of the generating operation and the updating operation in a pipelined manner; or under multiple seeds, support to perform random number generation operations for each seed in a pipelined manner.

In a second aspect, the present disclosure provides an integrated circuit chip comprising an apparatus for generating random numbers as described above and as will be described in the various embodiments below.

In a third aspect, the present disclosure provides a board comprising an integrated circuit chip as described above and as will be described in the various embodiments below.

In a fourth aspect, the present disclosure provides an electronic device comprising an integrated circuit chip as described above and as will be described in the various embodiments below.

In a fifth aspect, the present disclosure provides a method for generating random numbers, comprising: receiving a random number instruction and decoding the random number instruction; setting a state space according to the decoded random number instruction, wherein the state space is configured to store a state vector for generating the random number and accept a state vector update to the state space; and performing a random number generation operation based on a single seed or a plurality of seeds in accordance with the decoded random number instruction, wherein for each of the seeds the random number generation operation comprises a generation operation and an update operation performed in a pipelined manner, wherein the generation operation is for generating a random number and the update operation is for updating a state vector, wherein the size of the state space is set to: under a single seed, supporting the execution of the generating operation and the updating operation in a pipelined manner; or under multiple seeds, support to perform random number generation operations for each seed in a pipelined manner.

By utilizing the device for generating random numbers, the integrated circuit chip, the board card, the electronic device and the method, hardware can be realized to execute the random number generation operation in a flow manner. Thus, the scheme of the disclosure can efficiently generate random numbers by means of pipelining, thereby improving the overall performance of hardware and reducing computational overhead. Further, the scheme of the disclosure not only supports random number generation operation under a single seed, but also supports random number generation operation under a plurality of seeds, so that the random number generation mode is more flexible and has higher efficiency. In particular, in the random number generation operation of various kinds of sub-systems, the pipeline operation performance of hardware is further improved.

Drawings

The above, as well as additional purposes, features, and advantages of exemplary embodiments of the present disclosure will become readily apparent from the following detailed description when read in conjunction with the accompanying drawings. Several embodiments of the present disclosure are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which like reference numerals refer to similar or corresponding parts and in which:

FIG. 1 is a block diagram illustrating an apparatus for generating random numbers according to an embodiment of the present disclosure;

FIG. 2 is a schematic diagram illustrating an initialization process for generating random numbers according to one embodiment of the present disclosure;

FIG. 3 is a schematic diagram illustrating an initialization process for generating random numbers according to yet another embodiment of the present disclosure;

FIG. 4 is a schematic diagram illustrating a state space change for generating random numbers according to one embodiment of the present disclosure;

FIG. 5a is a schematic diagram illustrating a partial state space change for generating random numbers according to yet another embodiment of the present disclosure;

FIG. 5b is a schematic diagram illustrating another portion of state space variation for generating random numbers according to yet another embodiment of the present disclosure;

FIG. 6 is a flow chart illustrating a method for generating random numbers according to an embodiment of the present disclosure;

FIG. 7 is a flow chart illustrating a method of generating random numbers in a single seed mode according to an embodiment of the present disclosure;

FIG. 8 is a flow chart illustrating a method of generating random numbers in a dual seed mode according to an embodiment of the present disclosure;

FIG. 9 is a block diagram illustrating a combination processing device according to an embodiment of the present disclosure; and

fig. 10 is a schematic view showing the structure of a board according to an embodiment of the present disclosure.

Detailed Description

The present disclosure provides a scheme for efficient generation of random numbers in a parallel stream or ping-pong fashion. To this end, in one embodiment, the present disclosure proposes to efficiently set a state space storing state vectors for generating random numbers such that the state space meets hardware requirements at the time of pipeline operation. In particular, in a scenario where a single seed and multiple seeds are utilized to generate random numbers, the present disclosure proposes expanding a state space so as to support the generation operation and the update operation in generating random numbers in a pipelined manner in the two aforementioned scenarios. In combination with the improvement of the state space, the scheme of the disclosure also provides that the random number instruction of the plurality of micro instructions obtained after analysis is utilized to execute the generation operation and the updating operation. By means of the hardware settings and random number instructions of the present disclosure, random numbers may be efficiently generated, thereby improving overall performance of the computing system and reducing the overhead of generating random numbers.

The following will make clear and complete a description of the technical solutions in the embodiments of the present disclosure in conjunction with the accompanying drawings. It is to be understood that the embodiments described below are only some, but not all embodiments of the present disclosure. Based on the embodiments in this disclosure, all other embodiments that a person skilled in the art would obtain without making inventive efforts are within the scope of this disclosure intended to be protected.

Fig. 1 is a block diagram illustrating an apparatus 100 for generating random numbers according to an embodiment of the present disclosure. As shown in fig. 1, the apparatus 100 of the present disclosure includes an instruction decoding unit 101, an arithmetic unit 102, and a memory 103, which interact and cooperate with each other to realize generation of random numbers in a pipelined manner. In one implementation, the instruction decode unit may be configured to receive a random number instruction and decode the random number instruction. In one embodiment, the random number instructions of the present disclosure may include one or more of the following depending on the implementation scenario: the number of generated random numbers, the size and address of the state space, one or more seed information for generating random numbers, the concatenation parameters for concatenating random numbers, and the address of the output random numbers, etc. In one embodiment, the random number instruction may be a single instruction and may result in multiple or multiple microinstructions after being parsed by the instruction decode unit. In another embodiment, the seed information may include a specific seed (e.g., an initial value of 32 bits long) or information indicating whether the seed is from a special function register. Regarding the aforementioned concatenation parameters for concatenating random numbers, in case two seeds are applied, it may comprise a first address storing the random numbers associated with the first seed and an address offset (i.e. a span of storage addresses after all random numbers have been generated with the first seed). Thus, the arithmetic unit can determine the first address for storing the random number associated with the second seed, i.e. the first address of the first seed plus its address offset.

In one implementation scenario, the arithmetic unit may be configured to perform a random number generation operation based on a single seed or multiple seeds in accordance with the decoded random number instruction. In particular, for each seed, the random number generation operations of the present disclosure may include generation operations and update operations that are performed in a pipelined manner. Here, the generating operation may be used to generate a random number and the updating operation may be used to update the state vector. The arithmetic unit of the present disclosure may have different implementations according to different application scenarios. When applied to hardware architectures in the field of artificial intelligence, the arithmetic units of the present disclosure may be implemented as artificial intelligence processors or computing cores in processors.

In one implementation, the memory may be configured to set the state space according to the decoded random number instruction. In the context of the present disclosure, the state space may be configured to store a state vector for generating the random number and to accept state vector updates to the state space from the arithmetic unit. Further, the state space of the present disclosure may be sized such that under a single seed, the generating and updating operations are supported to be performed in a pipelined manner; or under multiple seeds, support to perform random number generation operations for each seed in a pipelined manner. In one embodiment, the size of the state space of the present disclosure may be configured by software. The configuration of the size of the state space is carried out through software, so that the setting of the size of the state space is more flexible and convenient, and the state space can better support the pipelining of the random number generation scheme under single seed and multiple seed modes.

In generating the random number, the operation unit may read at least one state vector from a state space (which may be regarded as one cyclic memory space) to generate the random number, and may generate an updated state vector by an update operation based on the at least one state vector. Further, the updated state vector may be utilized to update the state space. The random number generation operation of the present disclosure will be described in detail below in connection with the aforementioned random number instructions.

As previously described, the random number instructions of the present disclosure may be implemented as a single instruction and upon resolution by the instruction decode unit, may result in multiple microinstructions for performing pipelining. According to the plurality of microinstructions, the arithmetic unit may be configured to perform a generation operation of generating a random number and an update operation of updating the state space along the state space in a cyclic shift manner. Specifically, according to the microinstruction, the operation unit may perform a random number generation operation using a state vector stored on one of the plurality of space sections divided in the state space to generate a predetermined partial number of random numbers. The state vector stored on the aforementioned segment of space may then be utilized to generate a new state vector and stored on another segment of space in the multiple segments of space segments of state space. In order to achieve pipelining and reduce address dependencies between data, the present disclosure proposes to arrange the aforementioned two spatial sections far enough apart and to perform the operation of generating random numbers with different sections from the aforementioned two spatial sections immediately after the state vector update, thereby minimizing the latency of the arithmetic unit and thereby improving the arithmetic efficiency of the arithmetic unit in generating random numbers.

In one embodiment, the state space for random number generation may be implemented as a cyclic memory space. Thus, the above-described plurality of space sections variably operate on the read space section or the write space section according to the generation operation and the update operation to be performed. In other words, as the iterative loop of the generating operation and the updating operation is performed, the space segment previously used for reading is converted into a writing segment in a subsequent operation for writing of a new state vector, i.e., updating of the state space of the present disclosure is performed. Further, in order to implement the generation operation of the random number and the update operation of the state vector to be performed in a pipelined manner, the aforementioned read space section and write space section may be set to have a predetermined address interval in the state space to minimize address dependency between the state vectors in the subsequent generation operation and update operation.

In one implementation scenario, to enable continuous invocation and execution of random number instructions, an instruction decode unit of the present disclosure may be configured to output a state pointer to a state space in memory that may indicate a location in the state space of a first state vector for a generate operation (e.g., the coordinates of Y [ i+n ] in the state space described later in connection with the accompanying drawings) and a location in the state space of a first update state vector for an update operation (e.g., the coordinates of X [ i ] in the state space described later in connection with the accompanying drawings). Based on the state pointer, the arithmetic unit can know the positions of the first state vector and the first update state vector in the state space after the previous execution of the random number instruction, so that the generation operation and the update operation of the next random number instruction can be executed from the positions.

In one implementation scenario, a state space of the present disclosure may include N consecutively distributed read space segments and N consecutively distributed write space segments, and the N read space segments and the N write space segments form a correspondence in a pipelining operation, where N is a positive integer greater than or equal to 2. Based on the aforementioned preset conditions (e.g., configured by a random number instruction), during a pipeline operation, the arithmetic unit may be configured to perform the following operations in a loop according to the number of pre-generated random numbers, where i=1, … N:

in the generating operation for generating the random numbers, the operation unit may read a plurality of state vectors from the i-th read space section to generate a corresponding i-th plurality of random numbers; and

in an update operation for updating the state space, the operation unit may generate a plurality of state update vectors for updating the state space from a plurality of state vectors in the i-th read space section, and write the plurality of state update vectors to the i-th write space section.

As previously described, the arithmetic unit of the present disclosure may generate a predetermined number of random numbers from the random number instruction. When the random number instruction is parsed into a plurality of micro instructions (including one or more generating micro instructions and updating micro instructions) by the instruction decoding unit, the operation unit may generate a partial number of random numbers and update a partial number of state vectors several times according to different micro instruction combinations until a desired number of random numbers are finally generated. Specifically, the arithmetic unit of the present disclosure may generate a partial number of random numbers according to a generate microinstruction and update the partial number of state vectors according to an update microinstruction. Alternatively, the arithmetic unit may generate the partial number of random numbers according to one generation microinstruction, and update the partial number of state vectors according to a plurality of update microinstructions, respectively. Additionally, the arithmetic unit may generate the partial number of random numbers according to a plurality of generation microinstructions, respectively, and update the partial number of state vectors according to one update microinstruction. The manner in which the various microinstructions are combined is merely exemplary and generalized, and the manner in which the combinations operate will be described in greater detail herein below with reference to the accompanying drawings.

To minimize overlap of a read operation (e.g., reading a state vector from a state space to generate a random number) and a write operation (e.g., writing a new state vector to a state space to update the state space) of a microinstruction, the present disclosure proposes to include an address range entry in each microinstruction that indicates a read operation or write operation to the state space, and for two consecutive microinstructions for which there is an address overlap of the address range entries, the arithmetic unit will be configured to perform the read operation or write operation of the next microinstruction only after the read operation or write operation of one of the microinstructions is performed.

In one embodiment, the present disclosure proposes to divide a state space into a first state subspace and a second state subspace of the same size during initialization of the state space. In this case, the operation unit may be configured to perform an update operation for the second state subspace using the state vectors stored in the first state subspace to generate all state vectors for the second state subspace. When the number of state vectors of the first state subspace in the state space is represented by the foregoing N (where N represents N consecutive segments in the first state subspace, each segment including one state vector), then only the update operation of the state space is performed by using 0 to N-1 total N state vectors in the first state subspace to generate new N state vectors, whereby a state space including 0 to 2N-1 total N state vectors can be obtained. In generating all state vectors of the second state subspace, the arithmetic unit may be further configured to sequentially execute a plurality of microinstructions to generate all state vectors (i.e., N state vectors from N to 2N-1 in the previous example) for updating the second state subspace, wherein execution of each microinstruction generates a corresponding number of state vectors.

As previously described, aspects of the present disclosure provide a single sub-mode and multiple sub-mode random number generation scheme. Under a single seed, the arithmetic unit of the present disclosure may be configured to perform the aforementioned generating operation to generate the random number with a single seed-generated state vector, and to perform the aforementioned updating operation to update the relevant state vector of the state space. Specifically, in the case of a single seed, the operation unit may be configured to sequentially and alternately perform a generation operation of generating the random number and an update operation for updating the state space according to a plurality of microinstructions until a predetermined number of random numbers are generated. In one implementation, each generation operation generates a partial number of random numbers and each update operation updates a corresponding number of state vectors in the state space.

Alternatively or additionally, in the case of multiple seeds, each seed may have its associated state space, i.e. its state space is set for each seed. In this case, the operation unit may be configured to perform a generation operation associated with each seed using the state vector generated by each seed of the plurality of seeds, respectively, to generate the random number associated with each seed, and to perform an update operation associated with each seed to update the relevant state vector of the state space associated with each seed. When the plurality of seeds includes a first seed and a second seed, the memory of the present disclosure is provided with the state space associated with each of the first seed and the second seed. Based on this, in performing a pipeline operation for generating random numbers, the arithmetic unit of the present disclosure may be configured to cyclically perform the following operations according to the random number instruction until a predetermined number of random numbers are generated. In particular, the arithmetic unit may generate a partial random number using the state space associated with the first seed and update the corresponding number of state vectors. The arithmetic unit may then generate a partial random number using the state space associated with the second seed and update the corresponding number of state vectors. By repeatedly and iteratively executing the first seed associated generation operation and the update operation and the second seed associated generation operation and the update operation, the scheme of the disclosure obviously improves the speed and efficiency of generating the expected number of random numbers, realizes the running execution of hardware, and improves the overall performance of the hardware.

Fig. 2 is a schematic diagram illustrating an initialization process for generating random numbers according to one embodiment of the present disclosure. For ease of understanding, the size of the state space provided in the memory of fig. 1 is represented by straight lines, and the state vector in the state space is represented by dots. As previously described, the state vector is used for the generation of a certain number of random numbers and the updating of the state vector. Further, to enable pipelined execution of hardware, the present disclosure proposes expanding a state space, for example, expanding a state space that originally supports N state vectors to a state space that supports 2N state vectors.

As shown in the upper part of fig. 2, during the initialization process, the arithmetic unit may generate a state space containing N state vectors using seeds according to the decoded random number instruction. For example, a state space of n=351 (i.e., from 0 th to 350 th) state vectors may be generated. When one state vector occupies 4 bytes, then the initial state space of n=351 occupies 1404 bytes. When using, for example, the "mersen rotation for graphics processor dynamic generator" (Mersenne Twister for Graphic Processor Dynamic Creator, "MTGPDC") random number generation algorithm, three state vectors X [ i ], X [ i+1], and X [ i+m ] for generating an update state space are also shown in fig. 2, the three generating a state vector X [ i+n ] for update by:

t＝X[i+1]^(x[i]|mask)；

t＝t^(t<<sh1)；

u＝t^(X[i+M]>>sh2)；

X[i+N]＝u^(R_table[u&0xF])。

By repeatedly performing operations such as those described above, the state space of 0 to N-1 can be extended to the state space of 0 to 2N-1 as shown in the lower part of FIG. 2, thereby completing the initialization process of the present disclosure. In some embodiments, the initialization of the state space of 0-N-1 may be performed by a general purpose processor ("CPU"), while the initialization of the state space of N-2N-1 may be performed by a smart processor. Taking the above n=351 as an example, four microinstructions may be utilized to complete the expansion of the state vector from N to 2N after 351 state vectors are generated. Specifically, 96 state vectors are updated with a first microinstruction; next, updating the (N-M-96) state vectors with a second microinstruction; and then updating the min (96, M) state vectors, wherein if M is less than or equal to 96 states, the M state vectors are updated, so that the initialization of the N-2N-1 state vectors is completed; conversely, when M >96, then there is a fourth microinstruction that updates (M-96) the state vectors, thereby completing the initialization of N2N-1 state vectors.

It is to be appreciated that the initialization process described above applies to initialization processes in a single seed mode or multiple seed modes of the present disclosure. In particular, for a pattern of multiple seeds, the scheme of the present disclosure proposes initializing N states for each of the multiple seeds and then expanding the N states to 2N states, for example, in the manner described above, expanding a state space containing N state vectors to a state space containing 2N state vectors. In one implementation scenario, the initialization process described above is applicable to a scheme of processing or computing relatively less capable processor cores (simply "corelets"). The initialization process under a processor core (simply "large core") that is relatively large in processing or computing power (i.e., performs relatively many computing operations per clock cycle) is described below in connection with fig. 3.

Fig. 3 is a schematic diagram illustrating an initialization process for generating random numbers, i.e., the initialization process of the "big core" scheme mentioned above, according to yet another embodiment of the present disclosure. It should be noted that the initialization process for the 0-N-1 state vectors is similar to the previous "small core" approach, which may be generated, for example, by an off-chip system (e.g., CPU). Next, starting from the state with 0 to N-1 state vectors as shown in the upper part of fig. 3, the (N-M) state vectors are first generated using an update operation in a random number algorithm (such as the previous "MTGPDC") so that the (N-M) state vectors are updated starting from N, as shown by the arrowed line segments from "N" to "2N-M-1" in the middle part of fig. 3. Thereafter, the M states are updated again, as indicated by the arrowed line segments from "2N-M-1" to "2N-1" in the lower part of FIG. 3. In one implementation scenario, the initialization of the aforementioned state vectors from "N" to "2N-M-1" and from "2N-M-1" to "2N-1" may be accomplished by two microinstructions, thereby ultimately expanding the state space containing N state vectors to a state space having 2N state vectors to support subsequent pipelining operations to generate random numbers.

Fig. 4 is a schematic diagram illustrating a state space change for generating random numbers according to one embodiment of the present disclosure. To facilitate an understanding of the generating operation and the updating operation of the present disclosure, the (1) th to (7) th states of the state space are shown in which the position (or space address) changes in the state space of the state vectors X [ i ], X [ i+1] and X [ i+m ] for updating the state vectors and the state vectors Y [ i+m-1] and Y [ i+n ] for generating random numbers are identified. Specifically, as described above, the following operations can be performed based on the aforementioned "MTGPDC" random number generation algorithm by using the state vectors Y [ i+ (m-1) ] and Y (i+n) shown in the figure to generate the random number O [ i ]:

t＝Y[i+(M-1)]^(Y[i+(M-1)])；

t＝t^(t>>8)；

O[i]＝Y[i+N]^T_table[t&0x4]。

the 3 state vectors X [ i ], X [ i+1] and X [ i+m ] may then be read from the state space based on the aforementioned "MTGPDC" random number generation algorithm, respectively, to generate a state vector Y (i+n) for updating, where i, i+1, i+m are the distances of the state vector to the head address of the state space (i.e., the address pointer of each state vector). As can be seen, the present disclosure proposes to perform a generation operation of generating a random number first and then to perform an update operation of updating a state vector in order to achieve a pipelined generation of the random number. In one implementation scenario, the present disclosure proposes alternately performing a generating operation and an updating operation based on a plurality of microinstructions for a predetermined (or target) number of random numbers in consideration of the processing capability of an arithmetic unit and access performance of data, thereby generating a predetermined number of random numbers.

Referring specifically to fig. 4 in conjunction with the above, in state (1) of the state space, the state space has completed the initialization process as described previously, i.e., N-2N-1 state vectors are generated from 0-N-1 state vector extensions, thereby obtaining a state space of 0-2N-1. Then, in the (2) th state of the state space, the arithmetic unit may generate 96 random numbers using the state vectors Y [ i+m-1] and Y [ i+n ]. For this purpose, the state vectors Y [ i+m-1] and Y [ i+n ] in the (1) th state are continuously shifted to the right in the process of generating random numbers until the state vector Y [ i+n ] reaches the spatial position of "n+96" in the state space. After generating 96 random numbers, the arithmetic unit may then generate 96 new state vectors using the state vectors X [ i ], X [ i+1] and X [ i+m ] to update the state space, as indicated by the segments of 0 to 96 indicated by the arrows in the (3) th state of the state space in the figure.

As previously described, the state space of the present disclosure may be divided into a plurality of consecutively distributed read space segments and write space segments. In view of this, for the example shown in FIG. 4, the segments through which the state vectors Y [ i+M-1] and Y [ i+N ] move during the state space from the (2) th state to the (3) th state can be regarded as the aforementioned read space segments, while the segments shown in the (3) th state that update 96 states, namely the aforementioned write space segments.

Then, in the (4) th state, the arithmetic unit may then sequentially read Y [ i+m-1] and Y [ i+n ] to the right along the spatial state to generate 96 random numbers. Thus, Y [ i+M-1] in state (3) will go from spatial position "96+M-1" to spatial position "192+M-1" in state (4). Accordingly, Y [ i+N ] in state (3) will reach spatial position "N+192" in state (4) from spatial position "N+96". Next, similar to the (3) th state, in the (5) th state, the arithmetic unit may generate 96 new state vectors using X [ i ], X [ i+1] and X [ i+m ] to update the state space, as indicated by the sections of "96 to 192" indicated by the arrows in the (5) th state. Similar to the above description, the "96-192" sections herein are also write space sections in the context of the present disclosure.

Similar to the above (4) th and (5) th states, in the (6) th and (7) th states, the arithmetic unit may generate 159 random numbers and update the (N-M-192) state vector, that is, as shown in the section of "192 to (N-M)" indicated by the arrow in the (7) th state. Although not further shown in FIG. 4, it is understood that different implementations may be employed herein for updating (N-M-192) state vectors based on a numerical comparison of (M-1) and 96. Specifically, when (M-1) <=96, then the arithmetic unit may update (M-1) state vectors by one micro-instruction (whereby X [ i+1] loops back to the state space "0" position), and then update 1 state by one micro-instruction (whereby X [ i ] loops back to the state space "0" position). In contrast, when M-1>96, then three microinstructions may be utilized to complete the update operation, e.g., first updating 96 states, then updating M-1-96 states (whereby X [ i+1] loops back to the state space "0" position), and finally updating 1 state (whereby X [ i ] loops back to the state space "0" position).

By utilizing the random number scheme of the present disclosure described above in connection with fig. 4, the idle latency during existing read and write operations to memory (e.g., waiting for a certain time after an update to read) can be overcome, so that the time can be utilized to perform the operation of generating the random number. In particular, the present disclosure enables hardware pipelining and efficient generation of random numbers by expanding a state space and dividing it into different memory segments that overcome address dependencies, and by performing a generate operation followed by an update operation.

Fig. 5a is a schematic diagram illustrating a partial state space change for generating a random number according to still another embodiment of the present disclosure, and fig. 5b is a schematic diagram illustrating another partial state space change for generating a random number according to the previous embodiment. In one embodiment, the generating and updating operations shown in fig. 5a and 5b may be performed by the aforementioned "big core". In contrast, the generating operation and the updating operation shown in fig. 4 may be performed by the aforementioned "corelet".

As shown in fig. 5a, in the (1) and (2) th states, the arithmetic unit may first generate 192 random numbers using two microinstructions and then update 192 state vectors in the state space whose positions are as indicated by the arrows in the (2) th state. The arithmetic unit may then execute the four microinstructions sequentially, generating 159 random numbers in the (3) th state, and updating (N-M-192), (M-1), and 1 state vector in the (4) th to (6) th states, respectively.

Next, as shown in fig. 5b, in the (7) th and (8) th states, the arithmetic unit may then generate 192 random numbers, and then update 192 state vectors. As described above, the arithmetic unit can perform the aforementioned operations by means of two microinstructions, respectively. Then, in the (9) th state to the (9) th stateDuring the state, the arithmetic unit may sequentially complete the operations of generating (160-M) random numbers, generating (M-1) random numbers, and updating 159 state vectors.

Whereas the state vector for generating the random number and the address change of the state vector for updating the state space in the state space have been described in connection with fig. 4 before, a description thereof is omitted here for the sake of brevity. In addition, while fig. 4 and 5 illustrate the generation and update operations of the present disclosure as being exemplified by n=351 (i.e., 2n=702), those skilled in the art will appreciate based on the teachings of the present disclosure that the value of N herein may vary depending on the processing capabilities of the hardware (e.g., arithmetic unit and memory). Further, the number of micro instructions for performing the generating operation and the updating operation may be adapted according to the hardware architecture.

Fig. 6 is a flowchart illustrating a method 600 for generating random numbers according to an embodiment of the present disclosure. Based on the foregoing description of the present disclosure with reference to fig. 1-5, those skilled in the art will understand that the method shown in fig. 6 may be implemented by the apparatus 100 shown in fig. 1, so that the description of the operation of the apparatus 100 is equally applicable to the operation of the steps of the method 600, and the same will not be repeated.

As shown in fig. 6, at step S602, a random number instruction is received and decoded. Here, the random number instruction includes various kinds of information about generation of the random number, and decoding may involve parsing the random number instruction so that a plurality of microinstructions may be obtained. Next, at step S604, a state space may be set according to the decoded random number instruction, wherein the state space is configured to store a state vector for generating a random number and accept a state vector update to the state space. Here, the state space may have the exemplary form described in connection with fig. 4, 5a and 5 b. Finally, at step S606, a random number generation operation based on a single seed or a plurality of seeds may be performed in accordance with the decoded random number instruction, wherein for each of said seeds the random number generation operation comprises a generation operation and an update operation performed in a pipelined manner, wherein the generation operation is for generating a random number and the update operation is for updating a state vector. In one embodiment, the state space may be sized to support the execution of the generating and updating operations in a pipelined manner under a single or multiple seeds.

Fig. 7 is a flowchart illustrating a method 700 of generating random numbers in a single sub-mode according to an embodiment of the present disclosure. To facilitate understanding and discussion of the flow, the state space under a single seed is also schematically shown, which includes 0-2N-1 state vectors.

As shown in fig. 7, at step S702, N state vectors may first be generated using seeds to initialize N states in a state space. Next, at step S704, the state may be updated with an update operation in the random number generation algorithm, thereby expanding the state subspace of 0-N-1 to the state subspace of N-2N-1, thereby obtaining a state space having 2N state vectors. Next, the aforementioned generation operation and update operation may be performed at steps S706 and S708 to generate a random number and update the state space. When the predetermined number of random numbers has not been generated, steps S710 and S712 may then be performed to then generate the predetermined partial number of random numbers. Similarly, the generating operation and the updating operation may also be repeatedly performed until a predetermined number of random numbers are generated.

Fig. 8 is a flowchart illustrating a method 800 of generating random numbers in a dual seed mode according to an embodiment of the present disclosure. Based on the foregoing description, it is to be appreciated that the method 800 illustrated in FIG. 8 may be performed by the apparatus 100 illustrated in FIG. 1, and thus the foregoing description of the apparatus 100 and its generation and updating operations is equally applicable to the following description of the method 800. In addition, for ease of description, the state spaces associated with each of the first seed and the second seed are also shown in FIG. 8, and each state space has a total of 2N state vectors holding 0-2N-1.

As shown in fig. 8, at steps S802 and S804, N state vectors in the first seed state space may be initialized, and then only an update operation is performed to expand the N state vectors to 2N state vectors. Similarly, at steps S806 and S808, an initialization operation with respect to the state space of the second seed is also performed to obtain a state vector containing 2N initial states.

Next, at steps S810 and S812, a generation operation of generating a random number with respect to the first seed and an update operation for updating the state space are performed, respectively. After the first seed is executed, an operation of generating a random number and an operation of updating a state space with respect to the second seed are then executed at steps S814 and S816. Assuming that the predetermined number of random numbers associated with the first seed and the second seed has not been reached at this time, then at steps S818 and S820, a partial number of random numbers may then be generated and the same partial number of state vectors updated accordingly. Likewise, at steps S822 and S824, an operation of generating a random number and updating a state space with respect to the second seed are then performed. When the predetermined number of random numbers has not yet been reached, the operations of generating random numbers and updating the state space with respect to the first seed and the second seed will still be alternately performed until the predetermined number of random numbers is generated.

In one implementation scenario, when the operation of generating the random number is performed simultaneously for the first seed and the second seed, a storage address of the random number associated with the first seed may be set in a random number instruction of the present disclosure. In this way, the arithmetic unit can place the random number associated with the second seed after the random number associated with the first seed according to the aforementioned memory address when generating the random number associated with the second seed. In particular, since the operation unit generates the random numbers of the first seed and the second seed in a ping-pong manner, the head address for storing the random number generated by the second seed can be calculated in advance based on the storage address of the random number of the first seed and the number of generation.

In one implementation scenario, for seeds that will not be used later, the random number instructions of the present disclosure may also include settings for enabling or disabling the seeds. Based on this, upon enablement (e.g., setting the enable flag bit to 1), the state space of the seed may be initialized (e.g., expanding from N states to 2N states); in contrast, when disabled (e.g., the disable flag bit is set to 1), after a predetermined number of random numbers are generated with the seed, the settings and state space associated with the seed are deleted. In another implementation scenario, the skipped state number may also be set in a random number instruction. Based on this, when performing the operation of generating random numbers based on the first seed and/or the second seed, some state vectors in the state space may be skipped or abandoned, for example, the state vector for updating is generated, but not written to the state space until the state vector of the intended position is obtained, and the generation operation and the updating operation are started at that position.

Fig. 9 is a block diagram illustrating a combination processing apparatus 900 according to an embodiment of the disclosure. As shown in fig. 9, the combined processing device 900 includes a computing processing device 902, an interface device 904, other processing devices 906, and a storage device 908. Depending on the application scenario, one or more computing devices 910 may be included in the computing processing device, which may be configured to perform the generation operations for generating random numbers and the update operations for updating the state space described herein in connection with fig. 1-8.

In various embodiments, the computing processing means of the present disclosure may be configured to perform user-specified operations. In an exemplary application, the computing processing device may be implemented as a single-core artificial intelligence processor or as a multi-core artificial intelligence processor. Similarly, one or more computing devices included within a computing processing device may be implemented as an artificial intelligence processor core or as part of a hardware architecture of an artificial intelligence processor core. When multiple computing devices are implemented as artificial intelligence processor cores or portions of hardware structures of artificial intelligence processor cores, the computing processing devices of the present disclosure may be considered to have a single core structure or a homogeneous multi-core structure.

In an exemplary operation, the computing processing device of the present disclosure may interact with other processing devices through an interface device to collectively accomplish user-specified operations. Depending on the implementation, other processing devices of the present disclosure may include one or more types of processors among general-purpose and/or special-purpose processors such as central processing units (Central Processing Unit, CPU), graphics processors (Graphics Processing Unit, GPU), artificial intelligence processors, and the like. These processors may include, but are not limited to, digital signal processors (Digital Signal Processor, DSPs), application specific integrated circuits (Application Specific Integrated Circuit, ASICs), field programmable gate arrays (Field-Programmable Gate Array, FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc., and the number thereof may be determined according to actual needs. As previously mentioned, the computing processing device of the present disclosure may be considered to have a single core structure or a homogeneous multi-core structure only with respect to it. However, when computing processing devices and other processing devices are considered together, both may be considered to form heterogeneous multi-core structures.

In one or more embodiments, the other processing device may interface the computing processing device of the present disclosure with external data and controls, performing basic controls including, but not limited to, data handling, starting and/or stopping of the computing device, and the like. In other embodiments, other processing devices may also cooperate with the computing processing device to jointly accomplish the computational tasks.

In one or more embodiments, the interface device may be used to transfer data and control instructions between the computing processing device and other processing devices. For example, the computing device may obtain input data from other processing devices via the interface device, and write the input data to a storage device (or memory) on the computing device. Further, the computing processing device may obtain control instructions from other processing devices via the interface device, and write the control instructions into a control cache on the computing processing device chip. Alternatively or in addition, the interface device may also read data in a memory device of the computing processing device and transmit it to the other processing device.

Additionally or alternatively, the combined processing apparatus of the present disclosure may further comprise a storage device. As shown in the figure, the storage means are connected to the computing processing means and the other processing means, respectively. In one or more embodiments, a storage device may be used to store data for the computing processing device and/or the other processing devices. For example, the data may be data that cannot be stored entirely within an internal or on-chip memory device of a computing processing device or other processing device.

In some embodiments, the present disclosure also discloses a chip (e.g., chip 1002 shown in fig. 10). In one implementation, the Chip is a System on Chip (SoC) and is integrated with one or more combined processing devices as shown in fig. 9. The chip may be connected to other related components by an external interface device (such as external interface device 1006 shown in fig. 10). The relevant component may be, for example, a camera, a display, a mouse, a keyboard, a network card, or a wifi interface. In some application scenarios, other processing units (e.g., video codecs) and/or interface modules (e.g., DRAM interfaces) etc. may be integrated on the chip. In some embodiments, the disclosure also discloses a chip packaging structure including the chip. In some embodiments, the disclosure further discloses a board card or an electronic device, which includes the above-mentioned chip package structure. The board will be described in detail with reference to fig. 10.

Fig. 10 is a schematic diagram illustrating the structure of a board 1000 according to an embodiment of the disclosure. As shown in fig. 8, the board includes a memory device 1004 for storing data, which includes one or more memory cells 1010. The memory device may be connected to and data transferred from the control device 1008 and the chip 1002 described above by means of, for example, a bus. Further, the board card also includes an external interface device 1006 configured for data relay or transfer functions between the chip (or chips in a chip package structure) and an external device 1012 (e.g., a server or computer, etc.). For example, the data to be processed may be transferred by the external device to the chip through the external interface means. For another example, the calculation result of the chip may be transmitted back to the external device via the external interface device. The external interface device may have different interface forms according to different application scenarios, for example, it may use a standard PCIE interface or the like.

In one or more embodiments, the control device in the disclosed board card may be configured to regulate the state of the chip. For this purpose, in an application scenario, the control device may include a single chip microcomputer (Micro Controller Unit, MCU) for controlling the working state of the chip.

From the above description in connection with fig. 9 and 10, those skilled in the art will appreciate that the present disclosure also discloses an electronic device or apparatus that may include one or more of the above-described boards, one or more of the above-described chips, and/or one or more of the above-described combination processing apparatuses.

According to different application scenarios, the electronic device or apparatus of the present disclosure may include a server, a cloud server, a server cluster, a data processing apparatus, a robot, a computer, a printer, a scanner, a tablet, an intelligent terminal, a PC device, an internet of things terminal, a mobile terminal, a cell phone, a vehicle recorder, a navigator, a sensor, a camera, a video camera, a projector, a watch, an earphone, a mobile storage, a wearable device, a visual terminal, an autopilot terminal, a vehicle, a household appliance, and/or a medical device. The vehicle comprises an aircraft, a ship and/or a vehicle; the household appliances comprise televisions, air conditioners, microwave ovens, refrigerators, electric cookers, humidifiers, washing machines, electric lamps, gas cookers and range hoods; the medical device includes a nuclear magnetic resonance apparatus, a B-mode ultrasonic apparatus, and/or an electrocardiograph apparatus. The electronic device or apparatus of the present disclosure may also be applied to the internet, the internet of things, data centers, energy sources, transportation, public management, manufacturing, education, power grids, telecommunications, finance, retail, construction sites, medical, and the like. Further, the electronic device or apparatus of the present disclosure may also be used in cloud, edge, terminal, etc. application scenarios related to artificial intelligence, big data, and/or cloud computing. In one or more embodiments, a computationally intensive electronic device or apparatus according to aspects of the present disclosure may be applied to a cloud device (e.g., a cloud server), while a less power consuming electronic device or apparatus may be applied to a terminal device and/or an edge device (e.g., a smart phone or camera). In one or more embodiments, the hardware information of the cloud device and the hardware information of the terminal device and/or the edge device are compatible with each other, so that appropriate hardware resources can be matched from the hardware resources of the cloud device according to the hardware information of the terminal device and/or the edge device to simulate the hardware resources of the terminal device and/or the edge device, so as to complete unified management, scheduling and collaborative work of an end cloud entity or an edge cloud entity.

It should be noted that, for the sake of brevity, the present disclosure describes some methods and embodiments thereof as a series of actions and combinations thereof, but those skilled in the art will understand that the aspects of the present disclosure are not limited by the order of actions described. Thus, one of ordinary skill in the art will appreciate in light of the present disclosure or teachings that certain steps thereof may be performed in other sequences or concurrently. Further, those skilled in the art will appreciate that the embodiments described in this disclosure may be considered alternative embodiments, i.e., wherein the acts or modules involved are not necessarily required for the implementation of some or some aspects of this disclosure. In addition, the description of some embodiments of the present disclosure is also focused on, depending on the scenario. In view of this, those skilled in the art will appreciate that portions of one embodiment of the disclosure that are not described in detail may be referred to in connection with other embodiments.

In particular implementations, based on the disclosure and teachings of the present disclosure, one of ordinary skill in the art will appreciate that several embodiments of the disclosure disclosed herein may also be implemented in other ways not disclosed herein. For example, in terms of the foregoing embodiments of the electronic device or apparatus, the units are divided herein by taking into account the logic function, and there may be other manners of dividing the units when actually implemented. For another example, multiple units or components may be combined or integrated into another system, or some features or functions in the units or components may be selectively disabled. In terms of the connection relationship between different units or components, the connections discussed above in connection with the figures may be direct or indirect couplings between the units or components. In some scenarios, the foregoing direct or indirect coupling involves a communication connection utilizing an interface, where the communication interface may support electrical, optical, acoustical, magnetic, or other forms of signal transmission.

In the present disclosure, elements illustrated as separate elements may or may not be physically separate, and elements shown as elements may or may not be physically separate. The aforementioned components or units may be co-located or distributed across multiple network elements. In addition, some or all of the units may be selected to achieve the objectives of the embodiments of the disclosure, as desired. In addition, in some scenarios, multiple units in embodiments of the disclosure may be integrated into one unit or each unit may physically exist alone.

In some implementation scenarios, the above-described integrated units may be implemented in the form of software program modules. The integrated unit may be stored in a computer readable memory if implemented in the form of software program modules and sold or used as a stand alone product. In this regard, when the aspects of the present disclosure are embodied in a software product (e.g., a computer-readable storage medium), the software product may be stored in a memory, which may include instructions for causing a computer device (e.g., a personal computer, a server, or a network device, etc.) to perform some or all of the steps of the methods described by the embodiments of the present disclosure. The aforementioned Memory may include, but is not limited to, a usb disk, a flash disk, a Read Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a removable hard disk, a magnetic disk, or an optical disk, etc. various media capable of storing program codes.

In other implementation scenarios, the integrated units may also be implemented in hardware, i.e. as specific hardware circuits, which may include digital circuits and/or analog circuits, etc. The physical implementation of the hardware structure of the circuit may include, but is not limited to, physical devices, which may include, but are not limited to, devices such as transistors or memristors. In view of this, various types of devices described herein (e.g., computing devices or other processing devices) may be implemented by appropriate hardware processors, such as CPU, GPU, FPGA, DSP and ASICs, etc. Further, the aforementioned storage unit or storage device may be any suitable storage medium (including magnetic storage medium or magneto-optical storage medium, etc.), which may be, for example, variable resistance memory (Resistive Random Access Memory, RRAM), dynamic random access memory (Dynamic Random Access Memory, DRAM), static random access memory (Static Random Access Memory, SRAM), enhanced dynamic random access memory (Enhanced Dynamic Random Access Memory, EDRAM), high bandwidth memory (High Bandwidth Memory, HBM), hybrid memory cube (Hybrid Memory Cube, HMC), ROM, RAM, etc.

The foregoing may be better understood in light of the following clauses:

clause A1, an apparatus for generating a random number, comprising:

an instruction decoding unit configured to receive a random number instruction and decode the random number instruction;

an arithmetic unit configured to perform a random number generation operation based on a single seed or a plurality of seeds according to the decoded random number instruction, wherein the random number generation operation includes a generation operation and an update operation performed in a pipelined manner for each of the seeds, wherein the generation operation is for generating a random number and the update operation is for updating a state vector; and

a memory configured to set a state space according to the decoded random number instruction, wherein the state space is configured to store a state vector for generating the random number and to accept a state vector update of the state space from an arithmetic unit,

wherein the state space is sized to:

under a single seed, supporting the execution of the generating operation and the updating operation in a pipelined manner; or alternatively

Under multiple seeds, random number generation operations of the respective seeds are supported to be performed in a pipelined manner.

Clause A2, the apparatus of clause A1, wherein the random number instructions comprise one or more of the following: the number of generated random numbers, the size and address of the state space, one or more seed information for generating random numbers, a concatenation parameter for concatenating random numbers, and an address for outputting random numbers.

Clause A3, the apparatus of clause A1, wherein the random number instruction is a single instruction and parsed by the instruction decode unit to obtain a plurality of microinstructions for performing pipelining, wherein the arithmetic unit is configured to:

according to the microinstruction, generating operation for generating random numbers and updating operation for updating the state space are executed along the state space in a cyclic shift mode.

Clause A4, the apparatus of clause A3, wherein the microinstructions comprise a generate microinstruction for the generate operation and an update microinstruction for the update operation, and the generate microinstruction and update microinstruction are related to a state space location to which the generate operation and update operation relate, and the arithmetic unit is configured to generate a partial number of random numbers in one of:

Generating a partial number of random numbers according to a generate microinstruction, and updating the partial number of state vectors according to an update microinstruction; or alternatively

Generating a partial number of random numbers according to one generation microinstruction, and updating the partial number of state vectors according to a plurality of updating microinstructions respectively; or alternatively

A partial number of random numbers is generated based on the plurality of generating microinstructions, respectively, and a partial number of random numbers is generated based on one updating microinstruction.

Clause A5, the apparatus of clause A3, wherein each of the microinstructions comprises an address range entry indicating a read operation or a write operation for a state space, and there are two consecutive microinstructions of the address range entry that overlap in address, the arithmetic unit is configured to:

only after the read or write operation of one of the micro instructions is performed, the read or write operation of the next micro instruction is performed.

Clause A6, the apparatus of clause A1, wherein the size of the state space is configured by software to cause the generating and updating operations under a single seed to be performed in a pipelined manner and to cause random number generating operations under multiple seeds to be performed in a pipelined manner.

Clause A7, the apparatus of clause A1, wherein the state space is further configured to store a state pointer from the instruction decode unit, wherein the state pointer indicates a position in the state space of a first state vector for a generate operation and a position in the state space of a first update state vector for the update operation.

Clause A8, the apparatus of clause A1, wherein the state space comprises a plurality of space segments, and in the pipelining the plurality of space segments variably operate on a read space segment or a write space segment according to a generation operation and an update operation to be performed, wherein the read space segment and the write space segment have a predetermined address interval in the state space to support the generation operation and the update operation to be performed in a pipelined manner.

Clause A9, the apparatus of clause A1, wherein the state space comprises N consecutively distributed read space segments and N consecutively distributed write space segments, and the N read space segments and the N write space segments form a correspondence in a pipeline operation, wherein N is a positive integer greater than or equal to 2, and during the pipeline operation the arithmetic unit is configured to perform the following operations in a loop according to a number of pre-generated random numbers, wherein i = 1, … N:

In the generating operation, reading a plurality of state vectors from an ith read space section to generate a corresponding ith plurality of random numbers; and

in the update operation, a plurality of state update vectors for updating a state space are generated from the plurality of state vectors in the i-th read space section, and the plurality of state update vectors are written to the i-th write space section.

Clause a10, the apparatus of clause A1, wherein during initialization of the state space, the state space is divided into a first state subspace and a second state subspace of the same size, the arithmetic unit is configured to:

an update operation for the second state subspace is performed with the state vectors stored in the first state subspace to generate all state vectors for the second state subspace.

Clause a11, the apparatus of clause a10, wherein in generating all state vectors of the second state subspace, the arithmetic unit is further configured to:

the plurality of micro-instructions are executed sequentially to generate all state vectors for updating the second state subspace, wherein execution of each micro-instruction generates a corresponding number of state vectors.

Clause a12, the apparatus according to any of clauses A1-a11, wherein:

under a single seed, the arithmetic unit is configured to perform the generating operation with a single seed generated state vector to generate the random number, and to perform the updating operation to update a state space related state vector; or alternatively

Under a plurality of seeds, each seed has its associated state space, and the operation unit is configured to perform a generation operation associated with the respective seed with the state vector generated by the respective seed of the plurality of seeds to generate the random number associated with the respective seed, and to perform an update operation associated with the respective seed to update the associated state vector of the state space associated with the respective seed.

Clause a13, the apparatus of clause a12, wherein under a single seed, the arithmetic unit is configured to:

the generation operation of generating the random number and the update operation for updating the state space are successively and alternately performed according to a plurality of micro instructions until a predetermined number of random numbers are generated,

wherein each generating operation generates a partial number of random numbers and each updating operation updates a corresponding number of state vectors in the state space.

Clause a14, the apparatus of clause a12, wherein the plurality of seeds comprises a first seed and a second seed, wherein the memory is provided with the state space associated with each of the first seed and the second seed, and in performing a pipeline operation for generating random numbers, the arithmetic unit is configured to cycle through the following operations according to the random number instructions until a predetermined number of random numbers are generated:

generating a partial random number using the first seed-associated state space and updating a corresponding number of state vectors; and

a partial random number is generated using the state space associated with the second seed and a corresponding number of state vectors is updated.

Clause a15, an integrated circuit chip comprising the apparatus according to any of clauses A1-a 14.

Clause a16, a board card comprising the integrated circuit chip of clause a 15.

Clause a17, an electronic device comprising the integrated circuit chip of clause a 16.

Clause a18, a method for generating a random number, comprising:

receiving a random number instruction and decoding the random number instruction;

setting a state space according to the decoded random number instruction, wherein the state space is configured to store a state vector for generating the random number and accept a state vector update to the state space; and

Performing a random number generation operation based on a single seed or a plurality of seeds in accordance with the decoded random number instruction, wherein for each of the seeds the random number generation operation comprises a generation operation and an update operation performed in a pipelined manner, wherein the generation operation is for generating a random number and the update operation is for updating a state vector,

wherein the state space is sized to:

Clause a19, the method of clause a18, wherein the random number instructions comprise one or more of the following: the number of generated random numbers, the size and address of the state space, one or more seed information for generating random numbers, a concatenation parameter for concatenating random numbers, and an address for outputting random numbers.

Clause a20, the method of clause a18, wherein the random number instruction is a single instruction and parsed to yield a plurality of microinstructions for performing pipelining, the method further comprising:

Clause a21, the method of clause a20, wherein the microinstructions comprise a generate microinstruction for the generate operation and an update microinstruction for the update operation, and the generate microinstruction and update microinstruction are related to state space locations involved with the generate operation and update operation, and the method comprises generating a partial number of random numbers and a state vector of an update partial number in one of the following operations:

generating a partial number of random numbers according to a generate micro instruction, and updating the partial number of state vectors according to an update micro instruction; or alternatively

The partial number of random numbers is generated separately from the plurality of generating microinstructions, and the partial number of state vectors is updated from an update microinstruction.

Clause a22, the method of clause a20, wherein each of the microinstructions includes an address range entry indicating a read operation or a write operation for a state space, and there are two consecutive microinstructions of the address range entry that overlap in address, the method further comprising:

Clause a23, the method of clause a18, wherein the size of the state space is configured by software to cause the generating and updating operations under a single seed to be performed in a pipelined manner and to cause random number generating operations under multiple seeds to be performed in a pipelined manner.

Clause a24, the method of clause a18, wherein the state space is further configured to store a state pointer, wherein the state pointer indicates a location in the state space of a first state vector for a generating operation and a location in the state space of a first updated state vector for the updating operation.

Clause a25, the method of clause a18, wherein the state space comprises a plurality of space segments, and in the pipelining the plurality of space segments variably operate on a read space segment or a write space segment according to a generation operation and an update operation to be performed, wherein the read space segment and the write space segment have a predetermined address interval in the state space to support the generation operation and the update operation to be performed in a pipelined manner.

Clause a26, the method of clause a18, wherein the state space comprises N consecutively distributed read space segments and N consecutively distributed write space segments, and the N read space segments and the N write space segments form a correspondence in a pipeline operation, wherein N is a positive integer greater than or equal to 2, and during the pipeline operation the method performs the following operations in a loop according to the number of pre-generated random numbers, wherein i = 1, … N:

Clause a27, the method of clause a18, wherein during initialization of the state space, the state space is divided into a first state subspace and a second state subspace of the same size, the method further comprising:

Clause a28, the method of clause a27, wherein in generating all state vectors of the second state subspace, the method further comprises:

Clause a29, the method according to any of clauses a18-a28, wherein:

under a single seed, performing the generating operation with a single seed generated state vector to generate the random number, and performing the updating operation to update a state space related state vector; or alternatively

Under a plurality of seeds, each seed has its associated state space, and the method performs a generating operation associated with the respective seed with the state vector generated by the respective seed of the plurality of seeds to generate the random number associated with the respective seed, and performs an updating operation associated with the respective seed to update the associated state vector of the state space associated with the respective seed.

Clause a30, the method of clause a29, wherein under a single seed, the method further comprises:

Clause a31, the method of clause a29, wherein the plurality of seeds comprises a first seed and a second seed, wherein the method comprises providing the state space associated with each of the first seed and the second seed, and in performing the pipelining operation for generating the random number, the method comprises performing the following operations in cycles according to the random number instruction until a predetermined number of random numbers are generated:

It should be understood that the terms "first," "second," "third," and "fourth," etc. in the claims, specification, and drawings of this disclosure are used for distinguishing between different objects and not for describing a particular sequential order. The terms "comprises" and "comprising" when used in the specification and claims of the present disclosure, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the present disclosure is for the purpose of describing particular embodiments only, and is not intended to be limiting of the disclosure. As used in the specification and claims of this disclosure, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should be further understood that the term "and/or" as used in the present disclosure and claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.

As used in this specification and the claims, the term "if" may be interpreted as "when..once" or "in response to a determination" or "in response to detection" depending on the context. Similarly, the phrase "if a determination" or "if a [ described condition or event ] is detected" may be interpreted in the context of meaning "upon determination" or "in response to determination" or "upon detection of a [ described condition or event ]" or "in response to detection of a [ described condition or event ]".

While various embodiments of the present disclosure have been shown and described herein, it will be obvious to those skilled in the art that such embodiments are provided by way of example only. Numerous modifications, changes, and substitutions will occur to those skilled in the art without departing from the spirit and scope of the present disclosure. It should be understood that various alternatives to the embodiments of the disclosure described herein may be employed in practicing the disclosure. The appended claims are intended to define the scope of the disclosure and are therefore to cover all equivalents or alternatives falling within the scope of these claims.

Claims

1. An apparatus for generating random numbers, comprising:

wherein the state space is sized to:

Under a plurality of seeds, supporting random number generation operation of each seed to be executed in a pipelining manner;

wherein the state space includes N consecutively distributed read space segments and N consecutively distributed write space segments, and the N consecutively distributed read space segments and the N consecutively distributed write space segments form a correspondence in a pipeline operation, where N is a positive integer greater than or equal to 2, and during the pipeline operation, the operation unit is configured to perform the following operations in a loop according to a number of pre-generated random numbers, where i=1, … N:

2. The apparatus of claim 1, wherein the random number instruction comprises one or more of: the number of generated random numbers, the size and address of the state space, one or more seed information for generating random numbers, a concatenation parameter for concatenating random numbers, and an address for outputting random numbers.

3. The apparatus of claim 1, wherein the random number instruction is a single instruction and parsed by the instruction decode unit to obtain a plurality of microinstructions for performing pipelining, wherein the arithmetic unit is configured to:

4. The apparatus of claim 3, wherein the microinstructions include a generate microinstruction for the generate operation and an update microinstruction for the update operation, and the generate microinstruction and update microinstruction are associated with state space locations to which the generate operation and update operation relate, and the arithmetic unit is configured to generate the partial number of random numbers and the updated partial number of state vectors in one of:

Generating a partial number of random numbers according to one generation micro instruction, and updating the partial number of state vectors according to a plurality of updating micro instructions respectively; or alternatively

5. The apparatus of claim 3, wherein each of the microinstructions includes an address range entry indicating a read operation or a write operation for a state space, and there are two consecutive microinstructions of address overlap for the address range entry, the arithmetic unit is configured to:

6. The apparatus of claim 1, wherein the size of the state space is configured by software to cause the generating and updating operations under a single seed to be performed in a pipelined manner and to cause random number generating operations under multiple seeds to be performed in a pipelined manner.

7. The device of claim 1, wherein the state space is further configured to store a state pointer from the instruction decode unit, wherein the state pointer indicates a position in the state space of a first state vector for a generate operation and a position in the state space of a first update state vector for an update operation.

8. The apparatus of claim 1, wherein the state space comprises a plurality of space segments, and in the pipelining the plurality of space segments variably operate on a read space segment or a write space segment according to a generation operation and an update operation to be performed, wherein the read space segment and the write space segment have a predetermined address interval in the state space so as to support the generation operation and the update operation to be performed in a pipelined manner.

9. The apparatus of claim 1, wherein during initialization of the state space, the state space is divided into a first state subspace and a second state subspace of the same size, the arithmetic unit configured to:

10. The apparatus of claim 9, wherein in generating all state vectors of the second state subspace, the arithmetic unit is further configured to:

11. The apparatus of any of claims 1-10, wherein:

12. The apparatus of claim 11, wherein under a single seed, the arithmetic unit is configured to:

13. The apparatus of claim 11, wherein the plurality of seeds includes a first seed and a second seed, wherein the memory is provided with the state space associated with each of the first seed and the second seed, and in performing a pipeline operation for generating random numbers, the arithmetic unit is configured to cycle through the following operations according to the random number instructions until a predetermined number of random numbers are generated:

14. An integrated circuit chip comprising the apparatus of any of claims 1-13.

15. A board card comprising the integrated circuit chip of claim 14.

16. An electronic device comprising the integrated circuit chip of claim 15.

17. A method for generating random numbers, comprising:

wherein the state space is sized to:

wherein the state space comprises N consecutively distributed read space segments and N consecutively distributed write space segments, and the N consecutively distributed read space segments and the N consecutively distributed write space segments form a correspondence in a pipeline operation, wherein N is a positive integer greater than or equal to 2, and during the pipeline operation the method performs the following operations in a loop according to the number of pre-generated random numbers, wherein i=1, … N:

18. The method of claim 17, wherein the random number instruction comprises one or more of: the number of generated random numbers, the size and address of the state space, one or more seed information for generating random numbers, a concatenation parameter for concatenating random numbers, and an address for outputting random numbers.

19. The method of claim 18, wherein the random number instruction is a single instruction and is resolved to yield a plurality of microinstructions for performing pipelining, the method further comprising:

20. The method of claim 19, wherein the microinstructions include a generate microinstruction for the generate operation and an update microinstruction for the update operation, and the generate microinstruction and update microinstruction are associated with state space locations to which the generate operation and update operation relate, and the method includes generating a partial number of random numbers and a partial number of update state vectors with one of the following operations:

Generating a partial number of random numbers according to a generating microinstruction, and updating the partial number of state vectors according to a plurality of updating microinstructions; or alternatively

21. The method of claim 19, wherein each of the microinstructions includes an address range entry indicating a read operation or a write operation for a state space, and there are two consecutive microinstructions for which an address overlap exists for the address range entry, the method further comprising:

22. The method of claim 17, wherein the size of the state space is configured by software so as to cause the generating and updating operations under a single seed to be performed in a pipelined manner and to cause random number generating operations under multiple seeds to be performed in a pipelined manner.

23. The method of claim 17, wherein the state space is further configured to store a state pointer, wherein the state pointer indicates a position in the state space of a first state vector for a generating operation and a position in the state space of a first updating state vector for an updating operation.

24. The method of claim 17, wherein the state space comprises a plurality of space segments, and in the pipelining the plurality of space segments variably operate on a read space segment or a write space segment depending on a generation operation and an update operation to be performed, wherein the read space segment and the write space segment have a predetermined address interval in the state space to support the generation operation and the update operation to be performed in a pipelined manner.

25. The method of claim 17, wherein during initialization of the state space, the state space is divided into a first state subspace and a second state subspace of the same size, the method further comprising:

26. The method of claim 25, wherein in generating all state vectors for the second state subspace, the method further comprises:

27. The method of any one of claims 17-26, wherein:

28. The method of claim 27, wherein under a single seed, the method further comprises:

29. The method of claim 27, wherein the plurality of seeds includes a first seed and a second seed, wherein the method includes providing the state space associated with each of the first seed and the second seed, and in performing a pipeline operation for generating random numbers, the method includes, in accordance with the random number instruction, looping through the following operations until a predetermined number of random numbers are generated: