CN113032299B

CN113032299B - Bus system, integrated circuit device, board card and order preserving method for processing request

Info

Publication number: CN113032299B
Application number: CN201911351218.0A
Authority: CN
Inventors: 请求不公布姓名
Original assignee: Cambricon Technologies Corp Ltd
Current assignee: Cambricon Technologies Corp Ltd
Priority date: 2019-12-24
Filing date: 2019-12-24
Publication date: 2023-09-26
Anticipated expiration: 2039-12-24
Also published as: CN113032299A

Abstract

The invention relates to a bus system, an integrated circuit device, a board card and a sequence preserving method for a mixed system, wherein the bus system can comprise a combined processing device, and the combined processing device can also comprise a universal interconnection interface and other processing devices. The computing devices in the bus system interact with other processing devices to collectively complete specified computing operations. The combined processing means may further comprise storage means connected to the computing means and the other processing means, respectively, for data storage of the computing means and the other processing means.

Description

Bus system, integrated circuit device, board card and order preserving method for processing request

Technical Field

The present disclosure relates generally to the field of computers. More particularly, the present disclosure relates to bus systems, integrated circuit devices, boards, and order preservation methods for processing requests in hybrid systems.

Background

In the artificial intelligence field, deep learning through a neural network is a contemporary display, which is to layer mass data packets, process the data by using the operation concept similar to neurons, form the neural network, and finally obtain the result through deep learning layer by layer. Such computing processes consume enormous resources, often require multi-core processors as support, and the system can become quite complex. Such systems may belong to a hybrid type, and the order of processing of the signals by the processors during their cooperation with each other appears to be critical. Thus, how to obtain a solution for handling requests in hybrid systems remains a problem to be solved in the prior art.

Disclosure of Invention

To at least partially solve the technical problems mentioned in the background, the solution of the present disclosure provides a bus system, an integrated circuit device, a board card and a sequence preserving method for processing requests in a hybrid system.

In one aspect, the present disclosure provides a bus system for processing requests, comprising: bus, first subsystem and second subsystem. The first subsystem is an order preserving system configured to receive a first request and includes: a first processing unit configured to feed back a true result over the bus in response to the first request; and a second processing unit configured to feed back a false result over the bus in response to the first request. The second subsystem is an efficient system configured to receive the second request and includes: a third processing unit configured to feed back a true result over the bus in response to the second request; and a fourth processing unit configured to prohibit feedback over the bus in response to the second request.

In another aspect, the present disclosure provides an integrated circuit device comprising the bus system described above.

In another aspect, the present disclosure provides a board including the aforementioned integrated circuit device.

In another aspect, the present disclosure provides a method for processing requests in a multi-system, comprising the steps of: the method comprises the steps that a first subsystem receives a first request, wherein the first subsystem is an order-preserving system and comprises a first processing unit and a second processing unit; the first processing unit responds to the first request and feeds back a true result through a bus; the second processing unit responds to the first request and feeds back a false result through the bus; the second subsystem receives a second request, wherein the second subsystem is an efficient system and comprises a third processing unit and a fourth processing unit; the third processing unit responds to the second request and feeds back a true result through the bus; and the fourth processing unit prohibits feedback over the bus in response to the second request.

With the bus system, integrated circuit device, board card and method of the present disclosure, in a hybrid bus system, if it is an order preserving subsystem, the back-end device can identify and ensure the order of the responses by receiving the responses of all the computing devices, regardless of whether the computing devices that should perform tasks for a particular request will issue a response, without increasing hardware requirements. If an efficient subsystem, the computing device that is not performing tasks for a particular request does not respond, thereby improving work efficiency. The power consumption can be reduced as a whole and the arrangement area of the integrated circuit is reduced.

Drawings

The above, as well as additional purposes, features, and advantages of exemplary embodiments of the present disclosure will become readily apparent from the following detailed description when read in conjunction with the accompanying drawings. Several embodiments of the present disclosure are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which like reference numerals refer to similar or corresponding parts and in which:

FIG. 1 is a frame diagram illustrating a computing device according to an embodiment of the present disclosure;

FIG. 2 is a block diagram illustrating a bus system according to an embodiment of the present disclosure;

FIG. 3 is a schematic diagram illustrating a data format requested in an embodiment according to the present disclosure;

FIG. 4 is a block diagram illustrating a bus system according to another embodiment of the present disclosure;

FIG. 5 is a block diagram illustrating an integrated circuit device according to an embodiment of the present disclosure;

FIG. 6 is a frame diagram illustrating a board card according to an embodiment of the present disclosure;

FIG. 7 is a flow chart illustrating an order preservation method according to an embodiment of the present disclosure;

FIG. 8 is a flow chart illustrating a method for processing requests in a hybrid system according to an embodiment of the present disclosure;

FIG. 9 is a flow chart illustrating a method for processing requests in an order preserving subsystem according to another embodiment of the present disclosure; and

Fig. 10 is a flowchart illustrating a method for processing requests in an efficient subsystem according to another embodiment of the present disclosure.

Detailed Description

The present disclosure generally provides a bus system, an integrated circuit device, a board, and a method for use in a hybrid system. Unlike the prior art, the present disclosure provides an efficient solution, which can effectively reduce energy consumption and reduce hardware area.

When multiple requests are sequentially sent to multiple computing devices through the bus, because the time required by each computing device to process the requests is not necessarily the same, it is possible that the post-sent requests are processed before the post-sent requests are processed, so that the post-device receives the response of the post-sent requests before the post-sent requests, and the instruction sequence is disordered. The order preservation referred to in this disclosure is to ensure that the order of response is consistent with the order of requests through a specific mechanism.

FIG. 1 is a frame diagram illustrating a computing device of an embodiment of the present disclosure. As shown in fig. 1, the computing device 100 includes: a processing unit 102, an arithmetic unit 104 and a storage unit 106.

In this embodiment, the processing unit 102 may be a central processing unit for acquiring data, parameters, and computing instructions. The processing unit 102 includes: instruction fetch module 108, decode module 110, instruction queue 112, dependency computation device 114, and store queue module 116. The instruction fetching module 108 is configured to fetch an operation instruction to be executed from the instruction sequence, the decoding module 110 is configured to decode the instruction fetched, the instruction queue 112 is configured to temporarily store the decoded operation instruction, the dependency relationship calculating device 114 determines a dependency relationship between the decoded operation instruction and an instruction that has not been executed before, and the storage queue module 116 is an ordered queue configured to store the instruction that has the dependency relationship with the instruction that has not been executed before and sequentially output the instruction to the operation unit 104 for operation.

In this embodiment, the operation unit 104 may be a machine learning unit (machine learning unit) for performing matrix operations. In the architecture of this embodiment, the arithmetic unit 104 includes a master processing circuit 118 and a plurality of slave processing circuits 120. When matrix operations need to be performed, the store queue module 116 sends operation instructions to the master processing circuit 118, and the master processing circuit 118 is configured to perform preamble processing and transmit data and operation instructions with the plurality of slave processing circuits 120. The slave processing circuit 120 is configured to perform an intermediate operation in parallel and obtain a plurality of intermediate results, and transmit the intermediate results to the master processing circuit 118, where the master processing circuit 118 performs subsequent processing on the plurality of intermediate results to obtain a final calculation result.

The storage unit 106 includes: input/output unit 122, cache 124 and register 126. The input/output unit 122 is configured to read or write data, parameters, and calculation instructions via the bus 128, the cache 124 is configured to store the calculation instructions, and the register 126 is configured to store the data and the parameters.

When a request is received by the computing device 100, the request is stored in the register 126 via the input/output unit 122 via the bus 128, and the processing unit 102 accesses the request and queues the request for execution in the ordered queue after processing by the fetch module 108, the decode module 110, the instruction queue 112, the dependency computing device 114, and the store queue module 116.

In this embodiment, if the processing unit 102 determines that the request needs to be executed, the relevant data and parameters are transmitted to the operation unit 104 for operation, and after the operation is completed, a response of a true result is fed back to the bus 128 through the output/input unit 122. If the processing unit 102 determines that the request is not needed (e.g., the request is not directed to the computing device 100), the computing device 104 is not driven to perform the operation, but the computing device 100 also feeds back a response of the false result to the bus 128 through the input/output unit 122.

Fig. 2 is a block diagram illustrating a bus system of another embodiment of the present disclosure, particularly for application in deep learning of neural networks. Deep learning is built in a heterogeneous environment with multiple cores, because it requires a large amount of computational resources. The bus system 200 includes a front-level computing device 202, a back-level computing device 204, a first computing device 206, a second computing device 208, a bus 210, and a storage 212, wherein the front-level computing device 202, the back-level computing device 204, the first computing device 206, and the second computing device 208 may be one or more types of processors among general-purpose and/or special-purpose processors such as a central processing unit ("CPU"), a graphics processing unit ("GPU"), an artificial intelligence processor, and the like. These processing means, if artificial intelligence processors, may have the structure shown in fig. 1.

The memory device 212 is configured to store data and may include multiple sets of memory 214, each set of memory 214 being coupled to the first computing device 206 and the second computing device 208 via the bus 210. Each set of memory 214 may be DDR SDRAM ("Double Data Rate SDRAM", double rate synchronous dynamic random access memory).

The front computing device 202 and the back computing device 204 may be units in what modules of the neural network system, where the front computing device 202 outputs requests to be submitted to the first computing device 206 or the second computing device 208 for execution of tasks, and the results are transmitted to the back computing device 204.

The front-end computing device 202, the back-end computing device 204, the first computing device 206, the second computing device 208, and the memory device 212 are connected in series via a bus 210. Bus 210 is a common communications backbone between the various functional components of the computer that is a transmission harness of wires. The bus 210 may be divided into a data bus, an address bus, and a control bus for transmitting data, data address, and control signals, respectively, according to the kind of information transmitted by the computer. In this embodiment, bus 210 is used to transfer data.

On a multi-core heterogeneous basis, the first computing device 206 and the second computing device 208 are different types of processors, such as a central processor and an artificial intelligence processor, each having a unique processor number for identifying the identity of the computing device.

In this embodiment, the data format of the request communicated over the bus system 200 is shown in FIG. 3, and the request 300 includes the following fields: an address field (addr) 302, a data field (data) 304, and an identification field (core_id) 306. The address field 302 is a 12-bit field storing the address of the memory; the data field 304 is a 64-bit field storing data information to be processed; the identification field 306 is 2 bits and stores identification information, i.e., the processor number of the computing device that specified the processing information. Upon receipt of the request 300, either computing device determines whether the identification information in the identification field 306 is consistent with its processor number. If so, the request 300 is sent to the computing device, the computing device reads the address in the address field 302, stores the data information in the data field 304 to the address corresponding to the memory, or reads the data from the address corresponding to the memory, depending on the function of the bus system 200.

Referring back to fig. 2, in this embodiment, the bus system 200 is an order preserving system, and after the task is executed by the front-end computing device 202, the request 300 is sent through the bus 210, and the bus 210 carries and transmits the request 300 to the first computing device 206 and the second computing device 208.

In this embodiment, the request 300 is sent for the first computing device 206, i.e., the previous computing device 202 requests the first computing device 206, but not the second computing device 208, to perform a task, so the identification information stored in the identification field 306 of the request 300 is the processor number (first processor number) of the first computing device 206. The first computing device 206 first verifies that the identification information in the identification field 306 matches the first processor number, reads the address in the address field 302 in response to the request 300, stores the data information in the data field 304 in the address corresponding to the memory, or reads the data from the address corresponding to the memory, and then feeds back a response of the true result to the subsequent computing device 204 via the bus 210.

Since the second computing device 208 is also coupled to the bus 210 and also receives the request 300, when the second computing device 208 verifies the identification field 306 of the request 300, it finds that the identification information is not the processor number (second processor number) of the second computing device 208, and thus prohibits responding to the request 300, i.e., not reading the address in the address field 302, not storing the data information in the data field 304 to the address corresponding to the memory, or reading the data from the address corresponding to the memory. The second computing device 208 of this embodiment also feeds back a response of the false result to the subsequent computing device 204 via the bus 210.

Either the true result of the first computing device 206 or the false result of the second computing device 208 is sent to the subsequent computing device 204 via a response, respectively. The response of the disclosed embodiments includes an error correction field having a data format as shown in table one, which is a 2-bit string of information, wherein:

list one

When the first computing device 206 has completed the task, the first error correction field is loaded with "0b00" to indicate that the normal access was successful, and the response is sent to the bus 210 indicating that the first computing device 206 did complete the task and had a true result. In this embodiment, although the second computing device 208 does not actually perform the task, a response is sent, and the second error correction field is also loaded with "0b00" to indicate that the normal access was successful, and the response is also sent to the bus 210. Since the request 300 is sent for the first computing device 206, it is only correct for the second computing device 208 to prohibit execution of the task, and thus the second error correction field of the response of the second computing device 208 also contains information of "0b00" to indicate correct operation. But the second computing device 208 does not actually perform the task of the request 300, the response issued by the second computing device 208 is a false result.

The subsequent computing device 204, upon receiving the responses from all computing devices, can confirm that the task of the request 300 is successfully completed. In this embodiment, after the post-stage computing device 204 receives the true result of the first computing device 206 and the false result of the second computing device 208, it can be confirmed that the task of the request 300 is successfully completed under the correct timing, and the post-stage computing device 204 will execute its task according to the true result of the first computing device 206, and then process the response from the first computing device 206 and the second computing device 208 to execute the next request, so as to avoid the confusion in timing caused by the response generated by the next request reaching the post-stage computing device 204 earlier than the true result of the first computing device 206.

Further, assume now that in a special case, the prior computing device 202 sends a first request to the first computing device 206, and then sends a second request to the second computing device 208. The first computing device 206 performs the corresponding task upon receiving the first request, which takes more time because of the more complex task. While the first computing device 206 is performing work on the first request, the second computing device 208 directly sends a response of the false result to the subsequent computing device 204 for the first request, and receives the second request, and the second computing device 208 sends a response of the true result to the subsequent computing device 204 after processing is completed because the work of the second request is simpler. The first computing device 206 completes the work of the first request after the second computing device 208 sends a response to the true result of the second request to the subsequent computing device 204, sends a response to the true result to the subsequent computing device 204, and then the first computing device 206 processes the second request and sends a response to the false result to the subsequent computing device 204 for the second request.

The order in which the responses are received by the back-end computing device 204 is: the false result for the first request by the second computing device 208, the true result for the second request by the second computing device 208, the true result for the first request by the first computing device 206, the false result for the second request by the first computing device 206. When the post-stage computing device 204 receives the false result of the second computing device 208 for the first request, the post-stage computing device 204 will know that the first request needs to be processed preferentially, expect to receive the true result of the first computing device 206 for the first request, but later receive the true result of the second computing device 208 for the second request, and the post-stage computing device 204 will put the true result of the second computing device 208 for the second request in the cache and leave it for processing. When the subsequent computing device 204 receives the true result of the first computing device 206 for the first request, it prioritizes processing based on the false result of the second computing device 208 for the first request and the true result of the first computing device 206 for the first request. When the false result of the first computing device 206 for the second request is received again, processing is performed based on the true result of the second computing device 208 for the second request and the false result of the first computing device 206 for the second request.

According to the technical scheme, when each computing device on the order-preserving bus system processes the request, the false result is sent even if the computing device is not aimed at the computing device, so that the subsequent computing device can order according to the responses, and the situation that the subsequent computing device is disordered in time sequence when executing work because the response of the subsequent request is faster than that of the previous request is avoided.

Fig. 4 is a block diagram illustrating the architecture of a bus system according to another embodiment of the present disclosure, the bus system 400 comprising two subsystems: a first subsystem 406 and a second subsystem 408, which are in signal communication with the front computing device 402 and the back computing device 404 via a bus 410. Further, the front computing device 402 sends a first request to the first subsystem 406, after the first subsystem 406 is processed, sends a response to the back computing device 404, after the back computing device 404 is processed, sends a second request to the second subsystem 408, and after the second subsystem 408 is processed, sends a response.

In this embodiment, the data formats of the first request and the second request transmitted on the bus system 400 are also shown in fig. 3, and will not be described again. Upon receipt of the request 300, either computing device determines whether the identification information in the identification field 306 is consistent with its processor number. If so, indicating that the request 300 is sent to the computing device, the computing device reads the address in the address field 302, stores the data information in the data field 304 to the address corresponding to the memory, or reads the data from the address corresponding to the memory.

Bus system 400 is a hybrid system in which first subsystem 406 is an order preserving system, i.e., the order of response is critical and important, while second subsystem 408 is an efficient system, the order of response is not important, and it is important that the speed at which requests are processed, and second subsystem 408 can accept requests that are issued after they have been executed.

The first subsystem 406 includes a first computing device 412 and a second computing device 414. The operation of the front computing device 402, the first computing device 412, the second computing device 414 and the rear computing device 404 is not different from the embodiment of fig. 2, and thus is not described in detail.

In the first subsystem 406 in order, either the true result of the first computing device 412 or the false result of the second computing device 414 is sent to the subsequent computing device 404 via a response, respectively. Based on the first request, the first computing device 412 and the second computing device 414 load "0b00" in the error correction field of the response as shown in table one to display the successful information of the normal access, which indicates that the first computing device 412 does complete the task and sends out the response of the true result, and the second computing device 414 does not really execute the task of the first request and sends out the response of the false result.

The second subsystem 408 includes a third computing device 416 and a fourth computing device 418, which may be one or more types of general-purpose and/or special-purpose processors, such as a central processing unit, a graphics processor, an artificial intelligence processor, and the like. These processing means, if artificial intelligence processors, may have the structure shown in fig. 1.

In this embodiment, after the task is performed by the post-stage computing device 404, a second request is sent via the bus 410, and the bus 410 carries and transmits the second request to the second subsystem 408.

The second request is sent for the third computing device 416, so the identification information stored in the identification field of the second request is the processor number (third processor number) of the third computing device 416. The third computing device 416 first determines and confirms that the identification information of the identification field is consistent with the number of the third processor, and then, in response to the second request, reads the address in the address field, stores the data information in the data field into the address corresponding to the memory, or reads the data from the address corresponding to the memory, and then feeds back a response of the true result through the bus 410.

The fourth computing device 418 also receives the second request, and when the fourth computing device 418 verifies the identification field of the second request, it finds that the identification information is not the processor number (fourth processor number) of the fourth computing device 418, so that it is prohibited to respond to the second request, that is, not to read the address in the address field, not to store the data information in the data field to the address corresponding to the memory, or to read the data from the address corresponding to the memory. Because the second subsystem 408 is an efficient system and is not a sequence preserving system, the fourth computing device 418 does not feed back a response of a false result through the bus 410, so that the subordinate device can determine the sequence of the responses, and the fourth computing device 418 directly prohibits the feedback through the bus 410.

In the efficient second subsystem 408, as long as it is the responding computing device, in this embodiment, the third computing device 416, loads "0b00" in the error correction field of the response as shown in table one to display the successful message of the normal access, which indicates that the task is completed successfully and the response of the true result is sent out.

According to the technical scheme, when each computing device on the order-preserving system processes the request, the false result is sent even if the computing device is not aimed at the computing device, so that the subsequent computing device can order according to the responses, and the situation that the subsequent computing device is disordered in time sequence when executing work because the response of the subsequent request is faster than that of the previous request is avoided. When each computing device on the order-preserving system is not required to process the request, the response of the true result is only sent based on the request of the computing device, and when the request is not directed to the computing device, the response is not fed back, so that the processing speed is improved.

Fig. 5 is a block diagram illustrating an integrated circuit device 500 according to an embodiment of the disclosure. As shown, integrated circuit device 500 includes a bus system 502, which bus system 502 may be bus system 200 or 400 of fig. 2 or 4. In addition, integrated circuit device 500 also includes a general interconnect interface 504 and other processing devices 506.

In this embodiment, the other processing device 506 may be one or more types of processors of general purpose and/or special purpose processors, such as a central processing unit, a graphics processor, an artificial intelligence processor, etc., the number of which is not limited but is determined according to actual needs. In certain instances, other processing devices 506 may interface the bus system 502 with external data and controls, performing basic control including, but not limited to, data handling, turning on, off, etc., of the machine learning computing device.

According to aspects of this embodiment, the universal interconnect interface 504 may be used to transfer data and control instructions between the bus system 502 and other processing devices 506. For example, the bus system 502 may obtain the required input data from the other processing devices 506 via the universal interconnect interface 504, writing to storage devices on the chip of the bus system 502. Further, the bus system 502 may obtain control instructions from other processing devices 506 via the universal interconnect interface 504, writing to control caches on the chip of the bus system 502. Alternatively or in addition, the universal interconnect interface 504 may also read data in a memory module of the bus system 502 and transmit it to the other processing device 506.

Optionally, the integrated circuit device 500 may also include a storage device 508, which may be coupled to the bus system 502 and other processing devices 506, respectively. In one or more embodiments, the storage device 508 may be used to store data for the bus system 502 and other processing devices 506, and is particularly suitable for data for which operations are required that cannot be fully stored in internal storage of the bus system 502 or other processing devices 506.

According to different application scenarios, the integrated circuit device 500 of the present disclosure may be used as an SOC system on chip for devices such as a mobile phone, a robot, an unmanned aerial vehicle, a video acquisition device, and a video acquisition device, so as to effectively reduce a core area of a control portion, improve a processing speed, and reduce overall power consumption. In this case, the universal interconnect interface 504 of the integrated circuit device 500 is connected with certain components of the apparatus. Some of the components referred to herein may be, for example, cameras, displays, mice, keyboards, network cards, or wifi interfaces.

In some embodiments, the present disclosure also discloses a chip or integrated circuit chip that includes integrated circuit device 500. In other embodiments, the disclosure also discloses a chip package structure, which includes the chip.

In some embodiments, the disclosure further discloses a board card, which includes the chip package structure. Referring to fig. 6, which provides the aforementioned exemplary board 600, the board 600 may include other mating components in addition to the chips 602 described above, which may include, but are not limited to: a memory device 604, an interface device 606, and a control device 608.

The memory device 604 is connected to the die 602 in the die package structure via a bus 614 for storing data. The memory device 604 may include multiple sets of memory 610. Each set of memory 610 is coupled to the chip 602 by a bus 614. Each set of memory 610 may be DDR SDRAM ("Double Data Rate SDRAM", double rate synchronous dynamic random access memory).

Unlike that shown in fig. 6, in one embodiment, the memory device 604 may include 4 sets of memory 610. Each set of memory 610 may include multiple DDR4 particles (chips). In one embodiment, the chip 602 may include 4 72-bit DDR4 controllers inside, 64 bits of the 72-bit DDR4 controllers described above are used for transmitting data, and 8 bits are used for ECC verification.

In one embodiment, each set of memory 610 may include multiple double rate synchronous dynamic random access memories arranged in parallel. DDR can transfer data twice in one clock cycle. A controller for controlling DDR is provided in the chip 602 for control of data transfer and data storage for each memory 610.

The interface device 606 is electrically connected to the die 602 within the die package structure. The interface device 606 is used to enable data transfer between the chip 602 and an external device 612 (e.g., a server or computer). In one embodiment, the interface device 606 may be a standard PCIE interface. For example, the data to be processed is transferred from the server to the chip 602 through the standard PCIE interface, so as to implement data transfer. In another embodiment, the interface device 606 may be another interface, and the disclosure is not limited to the specific implementation of the other interfaces, and may implement the switching function. In addition, the results of the computation of the chip 602 are still transmitted back to the external device 612 by the interface means 606.

The control device 608 is electrically connected to the chip 602 to monitor the status of the chip 602. Specifically, the chip 602 and the control device 608 may be electrically connected through an SPI interface. The control device 608 may include a single-chip microcomputer ("MCU", micro Controller Unit). The chip 602 may include multiple processing chips, multiple processing cores, or multiple processing circuits, and may carry multiple loads. Thus, the chip 602 may be in different operating states, such as multi-load and light load. The control device 608 may be used to regulate the operation of the plurality of processing chips, the plurality of processing circuits, and/or the plurality of processing circuits in the chip 602.

In some embodiments, the present disclosure also discloses an electronic device or apparatus including the above-described board 600. Depending on the application scenario, the electronic device or apparatus may include a data processing apparatus, a robot, a computer, a printer, a scanner, a tablet, a smart terminal, a cell phone, a vehicle recorder, a navigator, a sensor, a camera, a server, a cloud server, a camera, a video camera, a projector, a watch, an earphone, a mobile storage, a wearable device, a vehicle, a household appliance, and/or a medical device. The vehicle comprises an aircraft, a ship and/or a vehicle; the household appliances comprise televisions, air conditioners, microwave ovens, refrigerators, electric cookers, humidifiers, washing machines, electric lamps, gas cookers and range hoods; the medical device includes a nuclear magnetic resonance apparatus, a B-mode ultrasonic apparatus, and/or an electrocardiograph apparatus.

FIG. 7 is a flow chart illustrating a method of order preserving for processing requests, particularly for application in deep learning of neural networks, in accordance with embodiments of the present disclosure. As described above, deep learning requires a large amount of computing resources, and is often configured in a heterogeneous multi-core environment, such as the architecture shown in fig. 2 or the architecture of the first subsystem 406 shown in fig. 4, for example.

In step 702, the first computing device and the second computing device receive a request from a previous computing device, and the data format of the request includes fields such as an address field (addr) 302, a data field (data) 304, and an identity field (core_id) 306, as shown in fig. 3.

In step 704, the first computing device responds to the request by verifying that the request was sent to the first computing device, and thus feeding back a true result over the bus.

In step 706, the second computing device responds to the request by verifying that the request was not sent to the second computing device, and thus feeding back a false result over the bus.

The latter computing device receives the response of the true result from the first computing device and the response of the false result from the second computing device, so that the situation that the latter computing device generates timing confusion when executing work because the response of the latter request is faster than that of the former request can be avoided.

Fig. 8 is a flow chart illustrating a method of a hybrid system processing a request according to an embodiment of the present disclosure, which may be applied in the hybrid bus system 400 shown in fig. 4.

In step 802, the first subsystem receives a first request, which is an order-preserving system and includes a first processing unit and a second processing unit, where the data format of the first request includes fields such as an address field (addr) 302, a data field (data) 304, and an identity field (core_id) 306, as shown in fig. 3.

In step 804, the first computing device verifies that the first request was sent to the first computing device in response to the first request, thus feeding back a first response of the true result over the bus.

In step 806, the second computing device verifies that the first request was not sent to the second computing device in response to the first request, and thus feeds back a second response of the false result over the bus.

In step 808, the second subsystem receives a second request, which is an efficient system and includes a third processing unit and a fourth processing unit, where the data format of the second request is also shown in fig. 3, and includes fields such as an address field (addr) 302, a data field (data) 304, and an identification field (core_id) 306.

In step 810, the third computing device verifies that the second request was sent to the third computing device in response to the second request, thus feeding back a third response of the true result over the bus.

In step 812, the fourth computing device verifies that the second request was not sent to the fourth computing device in response to the second request, thus disabling feedback over the bus.

According to the technical scheme, when each computing device on the order-preserving subsystem processes the request, the false results are sent even if the computing device is not aimed at the computing device, so that the subsequent computing device can order according to the responses, and the situation that the subsequent computing device is disordered in time sequence when executing work because the response of the subsequent request is faster than that of the previous request is avoided. When each computing device on the subsystem which is high-efficiency and does not need order preservation processes the request, the response of the true result can be sent only based on the request of the computing device, and when the response is not directed at the request of the computing device, the response is not fed back, so that the processing speed is improved.

Fig. 9 is a flow chart illustrating a method of processing a request according to another embodiment of the present disclosure, which is applied to an order preserving subsystem in a hybrid bus system, i.e., the first subsystem 406 of fig. 2 or fig. 4. In this embodiment, the front-end computing device sends the first request over a bus that carries and transmits the first request to the first computing device and the second computing device. The process shown in fig. 9 is performed to send a response to the subsequent computing device when the first request is received, regardless of whether the first computing device or the second computing device.

In step 902, the computing device receives a first request from a previous computing device, where the data format of the first request is also shown in fig. 3, and includes fields such as an address field (addr) 302, a data field (data) 304, and an identification field (core_id) 306.

In step 904, the computing device determines whether the identification field in the first request is consistent with the processor number of the computing device. If the identification field is consistent with the processor number, in step 906, the computing device performs the corresponding operation in response to the first request, reads the address in the address field, stores the data information in the data field to the address corresponding to the memory, or reads the data from the address corresponding to the memory. If the identification field is inconsistent with the processor number, the computing device prohibits the corresponding operation from being performed in response to the first request, i.e., the address in the address field is not read, the data information in the data field is not stored in or read from the address corresponding to the memory in step 908.

In step 910, the correct information is written into the response regardless of whether the computing device performs the operation corresponding to the first request. In more detail, the response of the disclosed embodiments includes an error correction field whose data format is shown in table one. The computing device loads a "0b00" in the error correction field indicating that the normal access was successful.

In step 912, the computing device outputs the response to the bus. If, in step 904, the computing device verifies that the identification information is consistent with the processor number, and, in step 906, the computing device performs the operation corresponding to the first request, i.e., indicating that the computing device did complete the task, the response shows a true result. If the computing device finds that the identification information is not consistent with the processor number in step 904, and the computing device prohibits the operation corresponding to the first request from being performed in step 908, i.e., indicates that the computing device has not actually completed the task, the response shows a false result.

After receiving the responses of all the computing devices, the rear-stage computing device can confirm that the requested tasks are successfully completed, so that the rear-stage computing device can order according to the responses, and the situation that the rear-stage computing device is disordered in time sequence when executing work due to the fact that the responses of the rear-stage computing device are faster than the responses of the front-stage computing device is avoided.

Fig. 10 is a flow chart illustrating a method of processing a request in accordance with another embodiment of the present disclosure, which is applied in an efficient subsystem of a hybrid bus system, i.e., the second subsystem 408 of fig. 4. In this embodiment, the efficient subsystem receives the second request via a bus that carries and transmits the second request to the third computing device and the fourth computing device. The flow shown in fig. 10 is executed when the second request is received, regardless of whether the third computing device or the fourth computing device.

In step 1002, the computing device receives a second request from a previous device, and the data format of the second request is shown in fig. 3, and includes fields such as an address field (addr) 302, a data field (data) 304, and an identification field (core_id) 306.

In step 1004, the computing device determines whether the identification field in the second request is consistent with the processor number of the computing device. If the identification field is consistent with the processor number, in step 1006, the computing device performs the corresponding operation in response to the second request, reads the address in the address field, stores the data information in the data field to the address corresponding to the memory, or reads the data from the address corresponding to the memory.

In step 1008, the computing device writes the correct information into the response. In more detail, the response of the disclosed embodiments includes an error correction field whose data format is shown in table one. The computing device loads a "0b00" in the error correction field indicating that the normal access was successful.

In step 1010, the computing device outputs the response to the bus, the response indicating that the computing device did complete the task, the response displaying a true result.

If it is determined in step 1004 that the identification field is inconsistent with the processor number, in step 1012, the computing device prohibits responding to the second request, and does not perform the corresponding operation, i.e., does not read the address in the address field, does not store the data information in the data field to the address corresponding to the memory, or reads the data from the address corresponding to the memory.

In step 1014, the computing device disables output responses to the bus.

Although the above-described method illustrates the execution of a program using the computing device of the present disclosure in the form of steps, the order of steps does not mean that the steps of the method must be performed in the order described, but rather may be processed in other orders or in parallel. In addition, other steps of the disclosure are not set forth herein for simplicity of description, but one skilled in the art will appreciate from the disclosure that the method may also be performed by using a computing device to perform the various operations described above in connection with the figures.

It should be appreciated that the foregoing embodiments are merely examples with two or four computing devices, and the computing devices of the present disclosure are not limited in number. Generally, several tens of computing devices are provided in the artificial intelligence chip, and those skilled in the art can easily understand the implementation of more than two computing devices through the disclosure of the present embodiment. Furthermore, the terms first and second in the present embodiment are used for distinguishing between different objects and not for describing a particular sequential order.

In the foregoing embodiments of the disclosure, the descriptions of the various embodiments are emphasized, and for a portion of this disclosure that is not described in detail in this embodiment, reference is made to the related descriptions of other embodiments. The technical features of the foregoing embodiments may be arbitrarily combined, and for brevity, all of the possible combinations of the technical features of the foregoing embodiments are not described, however, all of the combinations of the technical features should be considered as being within the scope of the disclosure.

It should be noted that, for simplicity of description, the foregoing method embodiments are all depicted as a series of acts, but it should be understood by those skilled in the art that the present disclosure is not limited by the order of acts described, as some steps may occur in other orders or concurrently in accordance with the disclosure. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all alternative embodiments, and that the acts and modules referred to are not necessarily required by the present disclosure.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to related descriptions of other embodiments.

In the several embodiments provided by the present disclosure, it should be understood that the disclosed apparatus may be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative, such as the division of the units, merely a logical function division, and there may be additional manners of dividing the actual implementation, such as multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, optical, acoustical, magnetic, or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, functional units in various embodiments of the present disclosure may be integrated in one computing device, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units described above may be implemented either in hardware or in software program modules.

The integrated units, if implemented in the form of software program modules, may be stored in a computer-readable memory for sale or use as a stand-alone product. Based on such understanding, when the technical solution of the present disclosure may be embodied in the form of a software product stored in a memory, the computer software product includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method described in the various embodiments of the present disclosure. And the aforementioned memory includes: u disk, read-Only Memory ("ROM"), random access Memory ("RAM", random Access Memory), removable hard disk, magnetic or optical disk, and other various media capable of storing program code.

The foregoing may be better understood in light of the following clauses:

clause A1, a bus system for processing a request, comprising: a bus; a first subsystem configured to receive a first request and comprising: a first processing unit configured to feed back a true result over the bus in response to the first request; and a second processing unit configured to feed back a false result over the bus in response to the first request; a second subsystem configured to receive a second request and comprising: a third processing unit configured to feed back a true result over the bus in response to the second request; and a fourth processing unit configured to prohibit feedback over the bus in response to the second request.

Clause A2, the bus system of clause A1, wherein the first request comprises an identification field carrying identification information corresponding to the first processing unit.

Clause A3, the bus system of clause A2, wherein the first processing unit is configured to: judging whether the identity information is consistent with the processor number of the first processing unit, if so, executing the following steps: performing an operation corresponding to the first request; and outputting a first response to the bus, the first response carrying information of the true result.

Clause A4, the bus system of clause A3, wherein the first response comprises a first error correction field carrying information that the operation corresponding to the first request is correct.

Clause A5, the bus system of clause A2, wherein the second processing unit is configured to: judging whether the identity information is consistent with the processor number of the second processing unit, if not, executing the following steps: inhibit performing an operation corresponding to the first request; and outputting a second response to the bus, the second response carrying information of the false result.

Clause A6, the bus system of clause A5, wherein the second response comprises a second error correction field carrying information that the operation corresponding to the first request is correct.

Clause A7, the bus system of clause A1, wherein the second request comprises an identification field carrying identification information corresponding to the third processing unit.

Clause A8, the bus system of clause A7, wherein the third processing unit is configured to: judging whether the identity information is consistent with the processor number of the third processing unit, if so, executing the following steps: executing the operation corresponding to the second request; and outputting a third response to the bus, the third response carrying information of the true result.

Clause A9, the bus system of clause A8, wherein the third response comprises a third error correction field carrying information that the operation corresponding to the second request is correct.

Clause a10, the bus system of clause A7, wherein the fourth processing unit is configured to: judging whether the identification information is consistent with the processor number of the fourth processing unit, if not, executing the following steps: prohibiting execution of an operation corresponding to the second request; and disabling output of any response to the bus.

Clause a11, the bus system of clause A1, wherein the first subsystem is an order preserving system.

Clause a12, an integrated circuit device comprising the bus system according to any of clauses A1-11.

Clause a13, a board comprising the integrated circuit device according to clause a 12.

Clause a14, a method for processing a request in a multi-system, comprising the steps of:

the method comprises the steps that a first subsystem receives a first request, wherein the first subsystem comprises a first processing unit and a second processing unit; the first processing unit responds to the first request and feeds back a true result through a bus; the second processing unit responds to the first request and feeds back a false result through the bus; the second subsystem receives a second request, wherein the second subsystem comprises a third processing unit and a fourth processing unit; the third processing unit responds to the second request and feeds back a true result through the bus; and the fourth processing unit prohibits feedback over the bus in response to the second request.

Clause a15, the method of clause a14, wherein the first request includes an identification field carrying identification information corresponding to the first processing unit, the step of the first processing unit feeding back a true result over a bus comprising the steps of: judging whether the identity information is consistent with the processor number of the first processing unit, if so, executing the following steps: performing an operation corresponding to the first request; and outputting a first response to the bus, the first response carrying information of the true result.

Clause a16, the method of clause a15, wherein the first response includes a first error correction field, the step of the first processing unit feeding back the true result over the bus further comprising the steps of: writing the correct information into the first error correction field.

Clause a17, the method of clause a15, wherein the step of the second processing unit feeding back the false result over the bus comprises the steps of: judging whether the identity information is consistent with the processor number of the second processing unit, if not, executing the following steps: inhibit performing an operation corresponding to the first request; and outputting a second response to the bus, the second response carrying information of the false result.

Clause a18, the method of clause a17, wherein the second response includes a second error correction field, the step of the second processing unit feeding back the false result over the bus further comprising the steps of: writing the correct information into the second error correction field.

Clause a19, the method of clause a14, wherein the second request includes an identification field carrying identification information corresponding to the third processing unit, the step of the third processing unit feeding back a true result over the bus comprising the steps of: judging whether the identity information is consistent with the processor number of the third processing unit, if so, executing the following steps: performing an operation corresponding to the second request; and outputting a third response to the bus, the third response carrying information of the true result.

Clause a20, the method of clause a19, wherein the third response includes a third error correction field, the step of the third processing unit feeding back the true result over the bus further comprising the steps of: writing the correct information into the third error correction field.

Clause a21, the method of clause a20, wherein the step of the fourth processing unit disabling feedback over the bus comprises the steps of: judging whether the identification information is consistent with the processor number of the fourth processing unit, if not, executing the following steps:

Prohibiting execution of an operation corresponding to the second request; and disabling output of any response to the bus.

The foregoing has described in detail embodiments of the present disclosure, with specific examples being employed herein to illustrate the principles and implementations of the present disclosure, the above examples being provided solely to assist in the understanding of the methods of the present disclosure and their core ideas; also, as will be apparent to those of ordinary skill in the art in light of the present disclosure, there are variations in the detailed description and the scope of the application, which in light of the foregoing description should not be construed to limit the present disclosure.

It should be understood that the terms "first," "second," "third," and "fourth," etc. in the claims, specification, and drawings of this disclosure are used for distinguishing between different objects and not for describing a particular sequential order. The terms "comprises" and "comprising" when used in the specification and claims of the present disclosure, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the present disclosure is for the purpose of describing particular embodiments only, and is not intended to be limiting of the disclosure. As used in the specification and claims of this disclosure, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should be further understood that the term "and/or" as used in the present disclosure and claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.

As used in this specification and the claims, the term "if" may be interpreted as "when..once" or "in response to a determination" or "in response to detection" depending on the context. Similarly, the phrase "if a determination" or "if a [ described condition or event ] is detected" may be interpreted in the context of meaning "upon determination" or "in response to determination" or "upon detection of a [ described condition or event ]" or "in response to detection of a [ described condition or event ]".

The foregoing has outlined rather closely the embodiments of the present disclosure, and detailed description of the principles and embodiments of the present disclosure have been presented herein with the application of specific examples, the description of the examples above being merely intended to facilitate an understanding of the method of the present disclosure and its core ideas. Also, those skilled in the art, based on the teachings of the present disclosure, may make modifications or variations in the specific embodiments and application scope of the present disclosure, all falling within the scope of the protection of the present disclosure. In view of the foregoing, this description should not be construed as limiting the disclosure.

Claims

1. A bus system for processing requests, comprising:

a bus;

a first subsystem configured to receive a first request and comprising:

A first processing unit configured to feed back a true result over the bus in response to the first request; and

a second processing unit configured to feed back a false result over the bus in response to the first request;

a second subsystem configured to receive a second request and comprising:

a third processing unit configured to feed back a true result over the bus in response to the second request; and

and a fourth processing unit configured to prohibit feedback over the bus in response to the second request.

2. The bus system of claim 1, wherein the first request includes an identification field carrying identification information corresponding to the first processing unit.

3. The bus system of claim 2, wherein the first processing unit is configured to: judging whether the identity information is consistent with the processor number of the first processing unit, if so, executing the following steps:

performing an operation corresponding to the first request; and

outputting a first response to the bus, the first response carrying information of the true result.

4. A bus system according to claim 3, wherein the first response includes a first error correction field carrying information that the operation corresponding to the first request is correct.

5. The bus system of claim 2, wherein the second processing unit is configured to:

judging whether the identity information is consistent with the processor number of the second processing unit, if not, executing the following steps:

inhibit performing an operation corresponding to the first request; and

outputting a second response to the bus, the second response carrying information of the false result.

6. The bus system of claim 5, wherein the second response includes a second error correction field carrying information that the operation corresponding to the first request is correct.

7. The bus system of claim 1, wherein the second request includes an identification field carrying identification information corresponding to the third processing unit.

8. The bus system of claim 7, wherein the third processing unit is configured to:

judging whether the identity information is consistent with the processor number of the third processing unit, if so, executing the following steps:

executing the operation corresponding to the second request; and

outputting a third response to the bus, the third response carrying information of the true result.

9. The bus system of claim 8, wherein the third response includes a third error correction field carrying information that the operation corresponding to the second request is correct.

10. The bus system of claim 7, wherein the fourth processing unit is configured to:

judging whether the identification information is consistent with the processor number of the fourth processing unit, if not, executing the following steps:

prohibiting execution of an operation corresponding to the second request; and

any response is disabled from being output to the bus.

11. The bus system of claim 1, wherein the first subsystem is an order preserving system.

12. An integrated circuit device comprising a bus system according to any of claims 1-11.

13. A board card comprising the integrated circuit device of claim 12.

14. A method for processing requests in a multi-system, comprising the steps of:

the method comprises the steps that a first subsystem receives a first request, wherein the first subsystem comprises a first processing unit and a second processing unit;

the first processing unit responds to the first request and feeds back a true result through a bus;

The second processing unit responds to the first request and feeds back a false result through the bus;

the second subsystem receives a second request, wherein the second subsystem comprises a third processing unit and a fourth processing unit;

the third processing unit responds to the second request and feeds back a true result through the bus; and

the fourth processing unit prohibits feedback over the bus in response to the second request.

15. The method of claim 14, wherein the first request includes an identification field carrying identification information corresponding to the first processing unit, the first processing unit feeding back a true result over a bus, comprising the steps of:

judging whether the identity information is consistent with the processor number of the first processing unit, if so, executing the following steps:

performing an operation corresponding to the first request; and

16. The method of claim 15, wherein the first response includes a first error correction field, the first processing unit feeding back a true result over a bus further comprising the steps of:

Writing the correct information into the first error correction field.

17. The method of claim 15, wherein the step of the second processing unit feeding back false results over the bus comprises the steps of:

inhibit performing an operation corresponding to the first request; and

18. The method of claim 17, wherein the second response includes a second error correction field, the step of the second processing unit feeding back a false result over the bus further comprising the steps of:

writing the correct information into the second error correction field.

19. The method of claim 14, wherein the second request includes an identification field carrying identification information corresponding to the third processing unit, the step of the third processing unit feeding back a true result over the bus comprising the steps of:

Performing an operation corresponding to the second request; and

and outputting a third response to the bus, wherein the third response carries information of the true result.

20. The method of claim 19, wherein the third response includes a third error correction field, the step of the third processing unit feeding back a true result over the bus further comprising the steps of:

writing the correct information into the third error correction field.

21. The method of claim 20, wherein the step of disabling feedback over the bus by the fourth processing unit comprises the steps of:

prohibiting execution of an operation corresponding to the second request; and

any response is disabled from being output to the bus.