CN115907005B - Large-scale full-connection I Xin Moxing annealing treatment circuit based on network on chip - Google Patents

Large-scale full-connection I Xin Moxing annealing treatment circuit based on network on chip Download PDF

Info

Publication number
CN115907005B
CN115907005B CN202310010051.1A CN202310010051A CN115907005B CN 115907005 B CN115907005 B CN 115907005B CN 202310010051 A CN202310010051 A CN 202310010051A CN 115907005 B CN115907005 B CN 115907005B
Authority
CN
China
Prior art keywords
spin
spin processing
module
spins
stage pipeline
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310010051.1A
Other languages
Chinese (zh)
Other versions
CN115907005A (en
Inventor
姚恩义
蒋东
汪祥瑞
黄展鸿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN202310010051.1A priority Critical patent/CN115907005B/en
Publication of CN115907005A publication Critical patent/CN115907005A/en
Application granted granted Critical
Publication of CN115907005B publication Critical patent/CN115907005B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Multi Processors (AREA)

Abstract

The invention discloses a large-scale full-connection I Xin Moxing annealing treatment circuit based on a network on chip, relates to the technical field of an Ictane model, and provides a scheme for solving the problems of small treatment scale, low expansibility, low convergence speed, low parallel processing capability and the like of a model circuit in the prior art. Comprising the following steps: the system comprises a global controller, a control bus, a spin processing array and a merging router array; the global controller performs parallel control on all spin processing units; all spin processing units share annealing temperature and random numbers and communicate and calculate through the merged router array. The method has the advantages of high convergence speed, high parallelism, high expansibility, low design complexity and low hardware resource cost, and can realize high-speed and high-parallelism annealing treatment on the fully-connected isooctyl model.

Description

Large-scale full-connection I Xin Moxing annealing treatment circuit based on network on chip
Technical Field
The invention relates to the technical field of an isooctyl model, in particular to a large-scale full-connection annealing treatment circuit for an Yi Xin Moxing based on an on-chip network.
Background
The combination optimization problem is to find out a class of problems of optimal objects, such as a travel business problem, a maximum cut problem, a graph coloring problem, a flight scheduling problem, and the like, from a limited set of discrete objects, and belongs to a typical non-deterministic polynomial problem. The random process of substance phase transformation is described by a group of interconnected spins by the Xin Moxing, and most of the combinatorial optimization problem can be mapped into the Xin Moxing, i.e., the optimal solution of the combinatorial optimization problem can be solved by solving the isooctyl model. The annealing algorithm is derived from the solid substance annealing principle, is a general optimization algorithm, and can effectively solve the problem of the I Xin Moxing. Processors of von-neumann architecture have difficulty in quickly solving the combinatorial optimization problem because the solution space of the combinatorial optimization problem grows explosively with variable numbers and the processors have an inherent serial operating mechanism. The quantum annealing processor solves the combination optimization problem by utilizing superconducting flux qubits, and has excellent precision and extremely high solving speed. However, quantum annealing processors require an extremely low operating environment and consume high costs. These drawbacks limit their use in practice. With the development of semiconductor manufacturing technology, annealing processors based on CMOS processes have been developed to overcome the above problems. The processor adopts an SRAM unit to store spin states, realizes interaction among spins by using a logic circuit, and jumps out of local minima by using a random number generator. Such processors offer significant improvements in execution speed, cost and power consumption over general purpose CPU and quantum annealing processors and are capable of operating at room temperature. However, most CMOS annealing processors currently only support sparse interconnects of spins, such as trellis diagrams, state Wang Tu, and hexagonal diagrams, which greatly limit the variety of combinatorial optimization problems they can solve. While several CMOS annealing processors supporting fully connected spins have been developed to solve many combinatorial optimization problems, such as the traveler problem and the max cut problem, these processors consume a large amount of resources and only implement a small number of fully connected spins, and at most only one spin can be selected for status update in each iteration step. They can only solve small-scale combinatorial optimization problems, and have low expansibility, slow convergence speed, and low parallel processing capability. In general, for an annealing architecture with high expansibility, high convergence speed and high parallelism, which supports large-scale fully-connected spin and parallel update states, no better design scheme exists at present.
Disclosure of Invention
The invention aims to provide a large-scale full-connection I Xin Moxing annealing treatment circuit based on a network on chip, which solves the problems in the prior art.
The invention discloses a large-scale full-connection I Xin Moxing annealing treatment circuit based on a network on chip, which comprises the following components: the system comprises a global controller, a control bus, a spin processing array and a merging router array; the global controller performs parallel control on all spin processing units; all spin processing units share annealing temperature and random numbers and communicate and calculate through the merged router array.
The global controller comprises an I/O module, a control logic module, a temperature scheduling module and a random number generating module; the I/O module is responsible for exchanging information between a user and the processing circuit; the control logic module is responsible for generating corresponding control signals; the temperature scheduling module and the random number generating module are respectively responsible for generating annealing temperature and random numbers; the control bus sends control signals, annealing temperatures, and random numbers to all spin processing units.
The spin processing array includes a plurality of spin processing units, each spin processing unit further containing 256 spins, one spin being updated per process.
The spin processing unit comprises a control unit, a state updating unit and a production unit;
the control unit comprises control logic and a counter; the control logic is used for receiving an instruction of the global controller and correspondingly generating control signals among all computing elements in the spin processing unit; the counter is used for recording the spin quantity of which the state is-1 in 256 spins;
the state updating unit consists of a part and a number register, a memory
Figure 551895DEST_PATH_IMAGE001
Register, a
Figure 735883DEST_PATH_IMAGE002
The system comprises a register, an absolute value device, three adders, a comparator, a multiplier and a turner; the absolute value device and the adder are used for receiving and accumulating the correlation coefficient
Figure 950963DEST_PATH_IMAGE003
And store the accumulated result in
Figure 712246DEST_PATH_IMAGE002
A register; the part and the sum register
Figure 162819DEST_PATH_IMAGE001
Registers for registering basic parts and numbers and accumulated results
Figure 325947DEST_PATH_IMAGE004
The method comprises the steps of carrying out a first treatment on the surface of the The comparator is used for comparing
Figure 318786DEST_PATH_IMAGE005
And
Figure 832944DEST_PATH_IMAGE006
determining whether to update the single spin processed by the spin processing unit, and selecting if the update condition is met
Figure 821629DEST_PATH_IMAGE004
Further updating the spin state, otherwise the state remains unchanged;
the production unit consists of seven stages of pipelines including an access controller, 16J memories, an h memory, 16 sigma memories, 16 comparators, a plurality of adders, a plurality of multiplexers and a plurality of NOT gates for generating
Figure 104843DEST_PATH_IMAGE004
Is a basic part of (2)
Figure 537092DEST_PATH_IMAGE007
And
Figure 272967DEST_PATH_IMAGE006
coefficients of (a)
Figure 799763DEST_PATH_IMAGE008
And sent to other spin processing units; the access controller is used for specially controlling and storing or reading out interaction coefficients
Figure 671904DEST_PATH_IMAGE009
And external magnetic coefficient
Figure 275055DEST_PATH_IMAGE010
The method comprises the steps of carrying out a first treatment on the surface of the Before the first stage pipeline, 16 coefficients are read from 16J memories at a time
Figure 232647DEST_PATH_IMAGE003
Starting 16 comparators to judge whether the coefficients are related to all the spins selected to be updated, and if so, transmitting the coefficients to a first-stage pipeline; the second stage pipeline is composed of 16 coefficients transmitted from the upper stage
Figure 828713DEST_PATH_IMAGE011
And 16 spins stored in sigma memory
Figure 820940DEST_PATH_IMAGE012
Production of
Figure 329413DEST_PATH_IMAGE004
Is formed of 16 basic parts
Figure 305459DEST_PATH_IMAGE013
I.e. when
Figure 439637DEST_PATH_IMAGE014
In the time-course of which the first and second contact surfaces,
Figure 286370DEST_PATH_IMAGE015
directly to the next stage when
Figure 228394DEST_PATH_IMAGE016
When the NOT gate pair is activated
Figure 160578DEST_PATH_IMAGE017
Performing bit inversion, and then transmitting the result to the next stage; the third stage pipeline accumulates the base partial sums through an adder tree having 16 inputs; the fourth stage pipeline is used for generating external magnetic coefficients
Figure 98447DEST_PATH_IMAGE018
Adding the result of the previous stage; the fifth stage pipeline records the counter in the control unit
Figure 65266DEST_PATH_IMAGE019
Adding the number of (2) to the result of the previous stage; the sixth stage pipeline directly sends the accumulated partial sums to the next stage pipeline if the accumulated partial sums are related to spins processed in the current spin processing unit, otherwise waits for other 240 partial sums and merges the partial sums into the next stage pipeline; the seventh stage pipeline calculates
Figure 915542DEST_PATH_IMAGE004
Is a basic part of (2)
Figure 866180DEST_PATH_IMAGE020
Or (b)
Figure 217527DEST_PATH_IMAGE006
Coefficients of (a)
Figure 163486DEST_PATH_IMAGE021
Packaged into partial sum or coefficient packets and forwarded to other spin processing units.
The merging router comprises a merging module and a routing module which are respectively responsible for merging and forwarding information packets.
The large-scale full-connection I Xin Moxing annealing treatment circuit based on the network-on-chip has the advantages that:
(1) The method has high convergence rate and high parallelism, and can realize high-speed and high-parallelism annealing treatment on the fully-connected isooctane model. Multiple spin concurrent updates are supported from the algorithm design level to the hardware implementation level: on the algorithm, the dynamic multi-thread parallel update annealing algorithm can dynamically adjust the thread number K and the single thread parallel update spin number M, under the condition of limited hardware resources, the convergence is accelerated, and the precision is ensured through a temperature return strategy; on hardware, parallel updating is realized through a network-on-chip architecture, and an algorithm function is realized.
(2) With high expansibility, each spin processing unit can process 256 spins, and the combination optimization problem of a larger scale can be processed by only increasing the number of spin processing units.
(3) The design complexity is low, the hardware resource cost is low, and the special design is adopted for the structure in the circuit: in the global controller, the temperature reciprocal is calculated by adopting the multiplier, so that the use of a high-cost divider is avoided, and meanwhile, all spin processing units share the temperature and the random number, so that the hardware cost and the power consumption are greatly reduced. In the spin processing array, a distributed storage and near memory computing structure is adopted, so that the structure of a spin processing unit is simplified, the communication traffic load is reduced, the computing complexity is reduced, and the computing efficiency is improved. Meanwhile, the spin processing unit uses a full pipeline structure, combines unique multiply-accumulate operation, adopts an adder with a counter to replace sixteen adders, and greatly reduces hardware cost. The merging router adopts a merging, deflection scheme and a full pipeline design, can merge a plurality of parts and data packets into one, reduces communication traffic load and calculation time consumed by each iteration, and simplifies design complexity.
Drawings
Fig. 1 is a schematic diagram of the overall architecture of a large-scale fully-connected i Xin Moxing annealing treatment circuit according to the present invention.
FIG. 2 is a schematic diagram of the operation state of the global controller according to the present invention.
Fig. 3 is a schematic diagram of the global controller according to the present invention.
Fig. 4 is a schematic diagram of a spin processing unit according to the present invention.
Fig. 5 is a schematic diagram of a merging router according to the present invention.
Detailed Description
As shown in FIG. 1, the large-scale full-connection I Xin Moxing annealing processing circuit based on the network-on-chip in the invention comprises a global controller, a control bus, a spin processing array and a merging router array.
Dynamic multithreading parallel update annealing algorithm adopts
Figure 309297DEST_PATH_IMAGE022
Figure 357018DEST_PATH_IMAGE023
Judging whether or not to satisfy the comparisonNew conditions based on system feedback
Figure 512056DEST_PATH_IMAGE024
Thread dynamic adjustment is carried out, and a temperature return operation is carried out in the final stage. Wherein N is the total number of spins and V is the total parallel updated spin number; k is the number of threads, M is the number of threads to update spins in parallel, and all are integers.
The global controller consists of four modules, including an I/O module, a control logic module, a temperature scheduling module and a random number generating module. The I/O module is responsible for exchanging information between the user and the processing circuitry. The control logic module is responsible for generating corresponding control signals. The temperature scheduling module and the random number generating module are respectively responsible for generating annealing temperature and random numbers. The control bus sends control signals, annealing temperatures, and random numbers to all spin processing units.
As shown in fig. 2, the global controller includes five working states, namely an idle state, an S1 static parameter configuration state, an S2 dynamic parameter configuration state, an S3 iteration state, and an S4 result return state. In the idle state, i.e. all components are in the standby state, when receiving a start signal from a user, the idle state is switched to the static parameter configuration state. In the static parameter configuration state, the initial annealing temperature value, the temperature decay factor, the temperature threshold, the successive iteration threshold, the initial spin state, and the random seed may be written into a register and then switched to the dynamic parameter configuration state. In the dynamic parameter configuration state, determining the thread number and the basic annealing temperature value of the next iteration according to the feedback signals from the spin processing array, transmitting the thread number and the basic annealing temperature value to all spin processing units, and switching to the iteration state. In the iterative state, all spin processing units are activated to calculate the spin state, if the temperature is lower than the temperature threshold, the calculation task is completed, the state is switched to the result return state, and otherwise, the state is switched to the dynamic parameter configuration state again. In the result return state, the final state of the spin is returned to the user as a solution to the initial problem.
As shown in fig. 3, in the global controller, the data packet from the user is identified by the monitor, and relevant static parameters are stored in the configuration register file, and other parameters are transmitted to the router. After one iteration, the flipped signal values from different threads are added by different counters, the accumulated result is saved in a log register file, and the comparator array is used to obtain the maximum flipped spin number to select the corresponding thread. If the result of the continuous repeated iterative accumulation is 0, the thread number is increased through the left shift operation. The linear feedback shift register is used for generating random numbers to determine whether spin is flipped. In the global controller, the temperature reciprocal is calculated in advance and sent to all spin processors, and a multiplier replaces a divider. In the global controller, when the annealing temperature is too low and the system falls into a local minimum, the annealing temperature is raised.
The spin processing array is used to update the states of the spins in parallel. The spin processing array includes a plurality of spin processing units, each spin processing unit further containing 256 spins, one spin being updated per process. As shown in fig. 4, the spin processing unit includes a control unit, a state updating unit, and a production unit. The control unit includes control logic and a counter. The control logic is used for receiving instructions of the global controller and correspondingly generating control signals among all computing elements in the spin processing unit. The counter is used to record the number of spins of 256 states-1. The state updating unit consists of a part and a number register, a memory
Figure 312522DEST_PATH_IMAGE001
Register, a
Figure 160392DEST_PATH_IMAGE002
The system comprises a register, an absolute value device, three adders, a comparator, a multiplier and a reverser. The absolute value device and the adder are used for receiving and accumulating the correlation coefficient
Figure 554464DEST_PATH_IMAGE025
And store the accumulated result in
Figure 122980DEST_PATH_IMAGE002
In a register. The part and the sum register
Figure 184477DEST_PATH_IMAGE001
Registers for registering basic parts and numbers and accumulated results
Figure 62303DEST_PATH_IMAGE004
. The comparator is used for comparing
Figure 943672DEST_PATH_IMAGE005
And
Figure 440512DEST_PATH_IMAGE006
determining whether to update the single spin processed by the spin processing unit, and selecting if the update condition is met
Figure 963373DEST_PATH_IMAGE004
The spin state is further updated, otherwise the state remains unchanged.
The production unit consists of seven stages of pipelines including an access controller, 16J memories, an h memory, 16 sigma memories, 16 comparators, a plurality of adders, a plurality of multiplexers and a plurality of NOT gates for generating
Figure 153046DEST_PATH_IMAGE004
Is a basic part of (2)
Figure 646344DEST_PATH_IMAGE026
And
Figure 946875DEST_PATH_IMAGE006
coefficients of (a)
Figure 451806DEST_PATH_IMAGE008
And sent to other spin processing units. The access controller is used for specially controlling and storing or reading out interaction coefficients
Figure 687746DEST_PATH_IMAGE009
And external magnetic coefficient
Figure 278127DEST_PATH_IMAGE027
. Before the first stage pipeline, 16 coefficients are read from 16J memories at a time
Figure 382350DEST_PATH_IMAGE003
The 16 comparators are enabled to determine if the coefficients are correlated with all spins selected for updating, and if so, are passed to the first stage pipeline. The second stage pipeline is composed of 16 coefficients transmitted from the upper stage
Figure 132000DEST_PATH_IMAGE025
And 16 spins stored in sigma memory
Figure 397896DEST_PATH_IMAGE028
Production of
Figure 616519DEST_PATH_IMAGE004
Is formed of 16 basic parts
Figure 993274DEST_PATH_IMAGE026
I.e. when
Figure 597430DEST_PATH_IMAGE014
In the time-course of which the first and second contact surfaces,
Figure 299807DEST_PATH_IMAGE029
directly to the next stage when
Figure 864781DEST_PATH_IMAGE030
When the NOT gate pair is activated
Figure 920593DEST_PATH_IMAGE031
Bit-wise negation is performed and the result is passed to the next stage. The third stage pipeline sums the base partial sums through an adder tree having 16 inputs. The fourth stage pipeline is used for generating external magnetic coefficients
Figure 785780DEST_PATH_IMAGE010
Added to the result of the previous stage. The fifth stage pipeline records the counter in the control unit
Figure 518113DEST_PATH_IMAGE032
Is added to the result of the previous stage. And if the accumulated partial sums are related to spins processed in the current spin processing unit, the sixth stage pipeline is directly fed into the next stage pipeline, otherwise, the other 240 partial sums are waited and combined into the next stage pipeline. The seventh stage pipeline calculates
Figure 570383DEST_PATH_IMAGE004
Is a basic part of (2)
Figure 554519DEST_PATH_IMAGE033
Or (b)
Figure 881071DEST_PATH_IMAGE006
Coefficients of (a)
Figure 925250DEST_PATH_IMAGE011
Packaged into partial sum or coefficient packets and forwarded to other spin processing units.
As shown in fig. 5, the merging router array provides communication links to support packet exchanges between different spin processing units and is capable of merging multiple portions and packets into one. The merging router mainly comprises a merging stage and a routing stage, and consists of four input ports in east, south, west and north, four output ports corresponding to the four input ports, six comparators, six selectors, three adders, four register groups, a popper, a cross switch and an arbiter. In the merging stage, six comparators compare the types and destinations of the data packets received via the input terminals, and if the merging conditions are met, the data packets are merged into one data packet by an adder. The routing phase sends the data packet to the current spin processing unit or other router via the output port through the popper, crossbar, and arbiter.
It will be apparent to those skilled in the art from this disclosure that various other changes and modifications can be made which are within the scope of the invention as defined in the appended claims.

Claims (2)

1. A network-on-chip-based large-scale fully-connected i Xin Moxing annealing processing circuit, comprising: the system comprises a global controller, a control bus, a spin processing array and a merging router array; the global controller performs parallel control on all spin processing units; all spin processing units share annealing temperature and random numbers, and communicate and calculate through the merging router array;
the global controller comprises an I/O module, a control logic module, a temperature scheduling module and a random number generating module; the I/O module is responsible for exchanging information between a user and the processing circuit; the control logic module is responsible for generating corresponding control signals; the temperature scheduling module and the random number generating module are respectively responsible for generating annealing temperature and random numbers; the control bus sends control signals, annealing temperature and random numbers to all spin processing units;
the spin processing array comprises a plurality of spin processing units, each spin processing unit further comprises 256 spins, and each spin is updated by each process;
the spin processing unit comprises a control unit, a state updating unit and a production unit;
the control unit comprises control logic and a counter; the control logic is used for receiving an instruction of the global controller and correspondingly generating control signals among all computing elements in the spin processing unit; the counter is used for recording the spin quantity of which the state is-1 in 256 spins;
the state updating unit consists of a part and a number register, a memory
Figure QLYQS_2
Register, one->
Figure QLYQS_6
The system comprises a register, an absolute value device, three adders, a comparator, a multiplier and a turner; the absolute value device and the adder are used for receiving and accumulating the correlation coefficient +.>
Figure QLYQS_8
And store the accumulated result in +.>
Figure QLYQS_3
A register; the part and the number register are used for registering basic parts and numbers +.>
Figure QLYQS_5
The register is used for registering accumulated +.>
Figure QLYQS_7
The method comprises the steps of carrying out a first treatment on the surface of the The comparator is used for comparing->
Figure QLYQS_9
And
Figure QLYQS_1
determining whether to update the single spin processed by the spin processing unit, and selecting +.>
Figure QLYQS_4
Further updating the spin state, otherwise the state remains unchanged;
wherein ,
Figure QLYQS_10
judging whether the updating condition is satisfied or not according to the system feedback and +.>
Figure QLYQS_11
Dynamically adjusting threads, andand performing a tempering operation in the final stage; n is the total number of spins, V is the total number of parallel updating spins, K is the number of threads, M is the number of parallel updating spins of a single thread, and all are integers; />
Figure QLYQS_12
Is the external magnetic coefficient; />
Figure QLYQS_13
Is the spin value of sequence number j, and j takes values 1 to 16;
the production unit consists of seven stages of pipelines including an access controller, 16J memories, an h memory, 16 sigma memories, 16 comparators, a plurality of adders, a plurality of multiplexers and a plurality of NOT gates for generating
Figure QLYQS_23
Basic part of (2) and->
Figure QLYQS_16
and />
Figure QLYQS_19
Correlation coefficient->
Figure QLYQS_17
And sent to other spin processing units; the access controller is used for specially controlling the storage or readout of the correlation coefficient +.>
Figure QLYQS_21
And external magnetic coefficient->
Figure QLYQS_25
The method comprises the steps of carrying out a first treatment on the surface of the Before the first stage pipeline, 16 correlation coefficients are read out from 16J memories at a time>
Figure QLYQS_29
The 16 comparators are enabled to determine whether the coefficients are correlated with all spins selected for updating, and if so, they are then used to update the spin-dependent coefficientsPassing to a first stage pipeline; the second stage pipeline is composed of 16 correlation coefficients transmitted from the upper stage>
Figure QLYQS_22
And 16 spins +.>
Figure QLYQS_26
Produce->
Figure QLYQS_14
16 basic parts and->
Figure QLYQS_18
I.e. when->
Figure QLYQS_27
When (I)>
Figure QLYQS_31
Directly to the next stage when>
Figure QLYQS_30
At the time, the NOT pair is activated>
Figure QLYQS_33
Performing bit inversion, and then transmitting the result to the next stage; the third stage pipeline accumulates the base partial sums through an adder tree having 16 inputs; the fourth stage pipeline is used for adding the external magnetic coefficient +.>
Figure QLYQS_24
Adding the result of the previous stage; the fifth stage pipeline records the counter in the control unit
Figure QLYQS_28
Adding the number of (2) to the result of the previous stage; the sixth stage pipeline directly sends the accumulated part to the next stage pipeline if the accumulated part is related to the spin processed in the current spin processing unitOtherwise, waiting for other 240 partial sums and merging into the next stage pipeline; the seventh stage pipeline will calculate +.>
Figure QLYQS_32
Basic part of (2) and->
Figure QLYQS_34
Or->
Figure QLYQS_15
Related coefficient of (a)
Figure QLYQS_20
Packaged into partial sum or coefficient packets and forwarded to other spin processing units.
2. The network-on-chip based large-scale full-connection i Xin Moxing annealing circuit of claim 1, wherein said merging router comprises a merging module and a routing module, each responsible for merging and forwarding packets.
CN202310010051.1A 2023-01-05 2023-01-05 Large-scale full-connection I Xin Moxing annealing treatment circuit based on network on chip Active CN115907005B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310010051.1A CN115907005B (en) 2023-01-05 2023-01-05 Large-scale full-connection I Xin Moxing annealing treatment circuit based on network on chip

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310010051.1A CN115907005B (en) 2023-01-05 2023-01-05 Large-scale full-connection I Xin Moxing annealing treatment circuit based on network on chip

Publications (2)

Publication Number Publication Date
CN115907005A CN115907005A (en) 2023-04-04
CN115907005B true CN115907005B (en) 2023-05-12

Family

ID=85771792

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310010051.1A Active CN115907005B (en) 2023-01-05 2023-01-05 Large-scale full-connection I Xin Moxing annealing treatment circuit based on network on chip

Country Status (1)

Country Link
CN (1) CN115907005B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116151171B (en) * 2023-04-17 2023-07-18 华南理工大学 Full-connection I Xin Moxing annealing treatment circuit based on parallel tempering

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115398211A (en) * 2020-04-20 2022-11-25 索尼集团公司 Information processing system, information processing method, program, information processing device, and computing device

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10896241B2 (en) * 2015-06-09 2021-01-19 Hitachi, Ltd. Information processing device and control method therefor
JP6841722B2 (en) * 2017-06-06 2021-03-10 株式会社日立製作所 Information processing device
JP2020009301A (en) * 2018-07-11 2020-01-16 株式会社日立製作所 Information processing device and information processing method
CN114065121A (en) * 2020-07-29 2022-02-18 华为技术有限公司 Calculation method and equipment for solving Itanium model
JP2022177458A (en) * 2021-05-18 2022-12-01 国立大学法人東京工業大学 Information processing device, method for processing information, and program
CN113722667A (en) * 2021-07-14 2021-11-30 清华大学 Data processing method and device based on Italian machine and Italian machine

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115398211A (en) * 2020-04-20 2022-11-25 索尼集团公司 Information processing system, information processing method, program, information processing device, and computing device

Also Published As

Publication number Publication date
CN115907005A (en) 2023-04-04

Similar Documents

Publication Publication Date Title
US11347477B2 (en) Compute in/near memory (CIM) circuit architecture for unified matrix-matrix and matrix-vector computations
CN115907005B (en) Large-scale full-connection I Xin Moxing annealing treatment circuit based on network on chip
CN101833441A (en) Parallel vector processing engine structure
US7409529B2 (en) Method and apparatus for a shift register based interconnection for a massively parallel processor array
CN112487750A (en) Convolution acceleration computing system and method based on memory computing
CN114781632A (en) Deep neural network accelerator based on dynamic reconfigurable pulse tensor operation engine
Chang et al. DASM: Data-streaming-based computing in nonvolatile memory architecture for embedded system
Dorojevets et al. COOL-0: Design of an RSFQ subsystem for petaflops computing
Jiang et al. A network-on-chip-based annealing processing architecture for large-scale fully connected ising model
CN111475205B (en) Coarse-grained reconfigurable array structure design method based on data flow decoupling
CN113157638A (en) Low-power-consumption in-memory calculation processor and processing operation method
Crafton et al. Breaking barriers: Maximizing array utilization for compute in-memory fabrics
CN113360532B (en) Network flow cardinality online real-time estimation method based on outline structure
Li et al. Optimization strategies for digital compute-in-memory from comparative analysis with systolic array
CN112346704B (en) Full-streamline type multiply-add unit array circuit for convolutional neural network
CN116543808A (en) All-digital domain in-memory approximate calculation circuit based on SRAM unit
CN115858999B (en) Combined optimization problem processing circuit based on improved simulated annealing algorithm
CN116151171B (en) Full-connection I Xin Moxing annealing treatment circuit based on parallel tempering
JP2005504394A (en) Programmable array that efficiently performs convolution calculations with digital signal processing
CN117951427A (en) Full-connection isooctyl model reconfigurable processing circuit supporting multiple algorithms
Wang et al. Design exploration of multi-fpgas for accelerating deep learning
Dong et al. Multiple network-on-chip model for high performance neural network
CN111709872B (en) Spin memory computing architecture of graph triangle counting algorithm
Lenjani et al. Pulley: An Algorithm/Hardware Co-Optimization for In-Memory Sorting
Kuzmin et al. Associative processors: application, operation, implementation problems

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant